Probability and statistics are concerned with events which occur by chance. Examples
include occurrence of accidents, errors of measurements, production of defective and
nondefective items from a production line, and various games of chance, such as
drawing a card from a well-mixed deck, flipping a coin, or throwing a symmetrical
six-sided die. In each case we may have some knowledge of the likelihood of various
possible results, but we cannot predict with any certainty the outcome of any particular
If you need to create and interpret statistics in business or classroom settings, this easy-to-use guide is just what you need. It shows you how to use Excel's powerful tools for statistical analysis, even if you've never taken a course in statistics. Learn the meaning of terms like mean and median, margin of error, standard deviation, and permutations, and discover how to interpret the statistics of everyday life. You'll learn to use Excel formulas, charts, PivotTables, and other tools to make sense of everything from sports stats to medical correlations....
Sampling and descriptive statistics, probability, propagation of error, commonly used distributions, confidence intervals, hypothesis testing, correlation and simple linear regression, multiple regression,... As the main contents of the ebook "Statistics for Engineers and Scientists". Invite you to consult.
Most statistical machine translation systems employ a word-based alignment model. In this paper we demonstrate that word-based alignment is a major cause of translation errors. We propose a new alignment model based on shallow phrase structures, and the structures can be automatically acquired from parallel corpus. This new model achieved over 10% error reduction for our spoken language translation task.
Nowadays, digital terrain models (DTM) are an important source of spatial data for various applications in many scientific disciplines. Therefore, special attention is given to their main characteristic ‐ accuracy. At it is well known, the source data for DTM creation contributes a large amount of errors, including gross errors, to the final product.
Medical Statistics at a Glance is directed at undergraduate
medical students, medical researchers, postgraduates in the
biomedical disciplines and at pharmaceutical industry personnel.
All of these individuals will, at some time in their
professional lives, be faced with quantitative results (their
own or those of others) that will need to be critically
evaluated and interpreted, and some, of course, will have to
pass that dreaded statistics exam! A proper understanding
of statistical concepts and methodology is invaluable for
Often, the training procedure for statistical machine translation models is based on maximum likelihood or related criteria. A general problem of this approach is that there is only a loose relation to the ﬁnal translation quality on unseen text. In this paper, we analyze various training criteria which directly optimize translation quality. These training criteria make use of recently proposed automatic evaluation metrics.
This paper focuses on the analysis and prediction of so-called aware sites, deﬁned as turns where a user of a spoken dialogue system ﬁrst becomes aware that the system has made a speech recognition error. We describe statistical comparisons of features of these aware sites in a train timetable spoken dialogue corpus, which reveal signiﬁcant prosodic differences between such turns, compared with turns that ‘correct’ speech recognition errors as well as with ‘normal’ turns that are neither aware sites nor corrections. ...
It is important to correct the errors in the results of speech recognition to increase the performance of a speech translation system. This paper proposes a method for correcting errors using the statistical features of character co-occurrence, and evaluates the method. The proposed method comprises two successive correcting processes. The first process uses pairs of strings: the first string is an erroneous substring of the utterance predicted by speech recognition, the second string is the corresponding section of the actual utterance.
We present a novel OCR error correction method for languages without word delimiters that have a large character set, such as Japanese and Chinese. It consists of a statistical OCR model, an approximate word matching method using character shape similarity, and a word segmentation algorithm using a statistical language model. By using a statistical OCR model and character shape similarity, the proposed error corrector outperforms the previously published method. When the baseline character recognition accuracy is 90%, it achieves 97.4% character recognition accuracy. ...
Statistical methods require very large corpus with high quality. But building large and faultless annotated corpus is a very difficult job. This paper proposes an efficient m e t h o d to construct part-of-speech tagged corpus. A rulebased error correction m e t h o d is proposed to find and correct errors semi-automatically by user-defined rules. We also make use of user's correction log to reflect feedback. Experiments were carried out to show the efficiency of error correction process of this workbench. The result shows that about 63.2 % of tagging errors can be corrected. ...
In this thesis proposal I present my thesis work, about pre- and postprocessing for statistical machine translation, mainly into Germanic languages. I focus my work on four areas: compounding, deﬁnite noun phrases, reordering, and error correction. Initial results are positive within all four areas, and there are promising possibilities for extending these approaches.
Minimum Error Rate Training (MERT) and Minimum Bayes-Risk (MBR) decoding are used in most current state-of-theart Statistical Machine Translation (SMT) systems. The algorithms were originally developed to work with N -best lists of translations, and recently extended to lattices that encode many more hypotheses than typical N -best lists. We here extend lattice-based MERT and MBR algorithms to work with hypergraphs that encode a vast number of translations produced by MT systems based on Synchronous Context Free Grammars.
The development of Dialog-Based ComputerAssisted Language Learning (DB-CALL) systems requires research on the simulation of language learners. This paper presents a new method for generation of grammar errors, an important part of the language learner simulator. Realistic errors are generated via Markov Logic, which provides an effective way to merge a statistical approach with expert knowledge about the grammar error characteristics of language learners. Results suggest that the distribution of simulated grammar errors generated by the proposed model is similar to that of real learners.
We describe a statistical technique for assigning senses to words. An instance of a word is assigned a sense by asking a question about the context in which the word appears. The question is constructed to have high mutual information with the translation of that instance in another language. When we incorporated this method of assigning senses into our statistical machine translation system, the error rate of the system decreased by thirteen percent. language model does not realize that take my own decision is improbable because take and decision no longer fall within a single trigram. ...
We evaluate measures of contextual ﬁtness on the task of detecting real-word spelling errors. For that purpose, we extract naturally occurring errors and their contexts from the Wikipedia revision history. We show that such natural errors are better suited for evaluation than the previously used artiﬁcially created errors. In particular, the precision of statistical methods has been largely over-estimated, while the precision of knowledge-based approaches has been under-estimated.
(BQ) Part 1 book "A handbook of applied statistics in pharmacology" presents the following contents: Probability, distribution; mean, mode, median; variance, standard deviation, standard error, coefficient of variation; analysis of normality and homogeneity of variance; transformation of data and outliers; tests for significant differences,...
tation schemes in different projects are usually different, since the underlying linguistic theories vary and have different ways to explain the same language phenomena. Though statistical NLP systems usually are not bound to speciﬁc annotation standards, almost all of them assume homogeneous annotation in the training corpus.
We describe Akamon, an open source toolkit for tree and forest-based statistical machine translation (Liu et al., 2006; Mi et al., 2008; Mi and Huang, 2008). Akamon implements all of the algorithms required for tree/forestto-string decoding using tree-to-string translation rules: multiple-thread forest-based decoding, n-gram language model integration, beam- and cube-pruning, k-best hypotheses extraction, and minimum error rate training.
Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on conﬁdence estimation using system-based features, such as word posterior probabilities calculated from N best lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features.