# Statistical model comparison

• ### Báo cáo khoa học: "N-gram-based Statistical Machine Translation versus Syntax Augmented Machine Translation: comparison and system combination"

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. ...

• ### CHEMOMETRICS IN PRACTICAL APPLICATIONS

In the book "Chemometrics in practical applications", various practical applications of chemometric methods in chemistry, biochemistry and chemical technology are presented, and selected chemometric methods are described in tutorial style. The book contains 14 independent chapters and is devoted to filling the gap between textbooks on multivariate data analysis and research journals on chemometrics and chemoinformatics.

• ### STATISTICS AND DATA ANALYSIS FOR THE BEHAVIORAl SCIENCES

Tham khảo sách 'statistics and data analysis for the behavioral sciences', khoa học tự nhiên, vật lý phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

• ### CGFS Papers No 35 Credit risk transfer statistics

Given the variation in credit scoring methodologies, raw credit scores possess no intrinsic meaning, and comparing raw scores across companies is of limited value. Normalized or “standardized” results afford more meaningful comparisons. Averaged across all companies, the spread in standardized scores between “no minority” and “all minority”2 ZIP Codes was 38.9 percentiles—a very considerable gap.

• ### Báo cáo khoa học: "A Comparison of Merging Strategies for Translation of German Compounds"

In this article, compound processing for translation into German in a factored statistical MT system is investigated. Compounds are handled by splitting them prior to training, and merging the parts after translation. I have explored eight merging strategies using different combinations of external knowledge sources, such as word lists, and internal sources that are carried through the translation process, such as symbols or parts-of-speech. I show that for merging to be successful, some internal knowledge source is needed.

• ### Báo cáo khoa học: "Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability"

In statistical machine translation, a researcher seeks to determine whether some innovation (e.g., a new feature, model, or inference algorithm) improves translation quality in comparison to a baseline system. To answer this question, he runs an experiment to evaluate the behavior of the two systems on held-out data. In this paper, we consider how to make such experiments more statistically reliable.

• ### Báo cáo khoa học: "A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering"

We describe experiments with a Naive Bayes text classifier in the context of anti- spam E-mail filtering, using two different statistical event models: a multi-variate Bernoulli model and a multinomial model. We introduce a family of feature ranking functions for feature selection in the multinomial event model that take account of the word frequency information. We present evaluation results on two publicly available corpora of legitimate and spam E-mails.

• ### Báo cáo khoa học: "Comparison of Alignment Templates and Maximum Entropy Models for Natural Language Understanding"

In this paper we compare two approaches to natural language understanding (NLU). The first approach is derived from the field of statistical machine translation (MT), whereas the other uses the maximum entropy (ME) framework. Starting with an annotated corpus, we describe the problem of NLU as a translation from a source sentence to a formal language target sentence. We mainly focus on the quality of the different alignment and ME models and show that the direct ME approach outperforms the alignment templates method. ...

• ### Lecture Business statistics in practice (7/e): Chapter 16 - Bowerman, O'Connell, Murphree

Chapter 16 - Times series forecasting and index numbers. This chapter includes contents: Time series components and models, time series regression, multiplicative decomposition, simple exponential smoothing, Holt-Winter’s Models, the Box Jenkins methodology (optional advanced section), forecast error comparisons, index numbers.

• ### Class Notes in Statistics and Econometrics Part 22

CHAPTER 43 Multiple Comparisons in the Linear Model. Due to the isomorphism of tests and conﬁdence intervals, we will keep this whole discussion in terms of conﬁdence intervals. 43.1. Rectangular Conﬁdence Regions Assume you are interested in two linear combinations of β at the same time

• ### Báo cáo khoa học: "Predicting the ﬂuency of text with shallow structural features: case studies of machine translation and human-written text"

Sentence ﬂuency is an important component of overall text readability but few studies in natural language processing have sought to understand the factors that deﬁne it. We report the results of an initial study into the predictive power of surface syntactic statistics for the task; we use ﬂuency assessments done for the purpose of evaluating machine translation. We ﬁnd that these features are weakly but signiﬁcantly correlated with ﬂuency. Machine and human translations can be distinguished with accuracy over 80%.

• ### Handbook of Econometrics Vols1-5 _ Chapter 15

Chapter 15 APPROXIMATING THE DISTRIBUTIONS OF ECONOMETRIC ESTIMATORS AND TEST STATISTICS THOMAS J. ROTHENBERG The comparison of different hypotheses, i.e. of competing models, is the basis of model specification. It may be performed along two main lines.

In order to address this question, this paper seeks to take a closer look at the nature and determinants of competition within the EAC banking sector. Our main objective is to empirically estimate the degree of competition in the EAC banking systems. We do this by estimating two nonstructural measures of bank pricing behavior, the Lerner index and the Panzar and Rosse H-statistic.

• ### Báo cáo khoa học: "Should we Translate the Documents or the Queries in Cross-language Information Retrieval?"

Previous comparisons of document and query translation suffered difficulty due to differing quality of machine translation in these two opposite directions. We avoid this difficulty by training identical statistical translation models for both translation directions using the same training data. We investigate information retrieval between English and French, incorporating both translations directions into both document translation and query translation-based information retrieval, as well as into hybrid systems. ...

• ### Báo cáo "Simulation study of microscopic bubbles in amorphous alloy $Co_{81.5}B_{18.5}$"

Simulation of the diffusion mechanism via microscopic bubbles in amorphous materials is carried out using the statistical relaxation models $Co_{81.5}B_{18.5}$ containing $2\times 10^5$ atoms. The present work is focused on the role of these bubbles for self-diffusion in amorphous solids. It was found that the numbers of the vacancy bubbles in amorphous $Co_{81.5}B_{18.5}$ vary from $1.4\times 10^{-3}$ to $4\times 10^{-3}$ per atom depending on the relaxation degree. The simulation shows the collective character of the atomic movement upon diffusion atoms moving.