Statistical computing

A complete practical tutorial for RStudio, designed keeping in mind the needs of analysts and R developers alike. Stepbystep examples that apply the principles of reproducible research and good programming practices to R projects. Learn to effectively generate reports, create graphics, and perform analysis, and even build Rpackages with RStudio.
126p titatu_123 09032013 30 4 Download

This series aims to capture new developments and summarize what is known over the whole spectrum of mathematical and computational biology and medicine. It seeks to encourage the integration of mathematical, statistical and computational methods into biology by publishing a broad range of textbooks, reference works and handbooks. The titles included in the series are meant to appeal to students, researchers and professionals in the mathematical, statistical and computational sciences, fundamental biology and bioengineering, as well as interdisciplinary researchers involved in the field....
0p 951628473 07052012 33 7 Download

This paper describes an extension to the hidden Markov model for partofspeech tagging using secondorder approximations for both contextual and lexical probabilities. This model increases the accuracy of the tagger to state of the art levels. These approximations make use of more contextual information than standard statistical systems. New methods of smoothing the estimated probabilities are also introduced to address the sparse data problem.
8p bunrieu_1 18042013 16 4 Download

This paper reports the ongoing research of a thesis project investigating a computational model of early language acquisition. The model discovers wordlike units from crossmodal input data and builds continuously evolving internal representations within a cognitive model of memory. Current cognitive theories suggest that young infants employ general statistical mechanisms that exploit the statistical regularities within their environment to acquire language skills.
9p bunthai_1 06052013 24 4 Download

This paper presents a comparative study of five parameter estimation algorithms on four NLP tasks. Three of the five algorithms are wellknown in the computational linguistics community: Maximum Entropy (ME) estimation with L2 regularization, the Averaged Perceptron (AP), and Boosting. We also investigate ME estimation with L1 regularization using a novel optimization algorithm, and BLasso, which is a version of Boosting with Lasso (L1) regularization. We first investigate all of our estimators on two reranking tasks: a parse selection task and a language model (LM) adaptation task. ...
8p hongvang_1 16042013 23 3 Download

In this paper we describe a novel data structure for phrasebased statistical machine translation which allows for the retrieval of arbitrarily long phrases while simultaneously using less memory than is required by current decoder implementations. We detail the computational complexity and average retrieval times for looking up phrase translations in our sufﬁx arraybased data structure. We show how sampling can be used to reduce the retrieval time by orders of magnitude with no loss in translation quality. ...
8p bunbo_1 17042013 20 3 Download

The search space of PhraseBased Statistical Machine Translation (PBSMT) systems can be represented under the form of a directed acyclic graph (lattice). The quality of this search space can thus be evaluated by computing the best achievable hypothesis in the lattice, the socalled oracle hypothesis. For common SMT metrics, this problem is however NPhard and can only be solved using heuristics.
10p bunthai_1 06052013 25 3 Download

Think Bayes is an introduction to Bayesian statistics using computational methods and Python programming language. Bayesian statistics are usually presented mathematically, but many of the ideas are easier to understand computationally. Contents: Bayes's Theorem; Computational statistics; Tanks and Trains; Urns and Coins; Odds and addends; Hockey; The variability hypothesis; Hypothesis testing.
176p ringphone 06052013 46 3 Download

As part of its new Digital Government program, the National Science Foundation (NSF) requested that the Computer Science and Telecommunications Board (CSTB) undertake an indepth study of how information technology research and development could more effectively support advances in the use of information technology in government.
102p camnhung_1 14122012 55 2 Download

This textbook was designed and developed to provide health care students, primarily health information management and health information technology students, and health care professionals with a rudimentary understanding of the terms, definitions, and formulae used in computing health care statistics and to provide selftesting opportunities and applications of the statistical formulae.
288p cronus75 16012013 13 2 Download

We tackle the previously unaddressed problem of unsupervised determination of the optimal morphological segmentation for statistical machine translation (SMT) and propose a segmentation metric that takes into account both sides of the SMT training corpus. We formulate the objective function as the posterior probability of the training corpus according to a generative segmentationtranslation model. We describe how the IBM Model1 translation likelihood can be computed incrementally between adjacent segmentation states for efﬁcient computation. ...
6p hongdo_1 12042013 23 2 Download

Statistical models in machine translation exhibit spurious ambiguity. That is, the probability of an output string is split among many distinct derivations (e.g., trees or segmentations). In principle, the goodness of a string is measured by the total probability of its many derivations. However, ﬁnding the best string (e.g., during decoding) is then computationally intractable. Therefore, most systems use a simple Viterbi approximation that measures the goodness of a string using only its most probable derivation.
9p hongphan_1 14042013 14 2 Download

Stateoftheart computerassisted translation engines are based on a statistical prediction engine, which interactively provides completions to what a human translator types. The integration of human speech into a computerassisted system is also a challenging area and is the aim of this paper. So far, only a few methods for integrating statistical machine translation (MT) models with automatic speech recognition (ASR) models have been studied. They were mainly based on N best rescoring approach. ...
8p hongvang_1 16042013 25 2 Download

We describe a new loss function, due to Jeon and Lin (2006), for estimating structured loglinear models on arbitrary features. The loss function can be seen as a (generative) alternative to maximum likelihood estimation with an interesting informationtheoretic interpretation, and it is statistically consistent. It is substantially faster than maximum (conditional) likelihood estimation of conditional random ﬁelds (Lafferty et al., 2001; an order of magnitude or more).
8p hongvang_1 16042013 21 2 Download

In this paper we focus on how to improve pronoun resolution using the statisticsbased semantic compatibility information. We investigate two unexplored issues that inﬂuence the effectiveness of such information: statistics source and learning framework. Speciﬁcally, we for the ﬁrst time propose to utilize the web and the twincandidate model, in addition to the previous combination of the corpus and the singlecandidate model, to compute and apply the semantic information. t
8p bunbo_1 17042013 18 2 Download

Báo cáo khoa học: "A Comparative Study on Reordering Constraints in Statistical Machine Translation"
In statistical machine translation, the generation of a translation hypothesis is computationally expensive. If arbitrary wordreorderings are permitted, the search problem is NPhard. On the other hand, if we restrict the possible wordreorderings in an appropriate way, we obtain a polynomialtime search algorithm. In this paper, we compare two different reordering constraints, namely the ITG constraints and the IBM constraints.
8p bunbo_1 17042013 16 2 Download

The processes through which readers evoke mental representations of phonological forms from print constitute a hotly debated and controversial issue in current psycholinguistics. In this paper we present a computational analysis of the graphophonological system of written French, and an empirical validation of some of the obtained descriptive statistics.
7p bunrieu_1 18042013 20 2 Download

This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word cooccurrence statistics, the generator ﬁlters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation system outperforms stateoftheart systems, automatically generating some of the most natural image descriptions to date. ...
10p bunthai_1 06052013 17 2 Download

Since they cluster terms through statistical measures of context similarities, these tools exploit recurring situations. Since singleword terms denote broader concepts than multiword terms, they appear more frequently in corpora and are therefore more appropriate for statistical clustering. The contribution of this paper is to propose an integrated platform for computeraided term extraction and structuring that results from the combination of LEXTER, a Term Extraction tool (Bouriganlt et al., 1996), and FASTR 1, a Term Normalization tool (Jacquemin et al., 1997). ...
8p bunthai_1 06052013 13 2 Download

Computer simulation is used to reduce the risk associated with creating new systems or with making changes to existing ones. More than ever, modern organizations want assurance that investments will produce the expected results. For instance, an assembly line may be required to produce a particular number of autos during an eight hour shift. Complex, interacting factors influence operation and so powerful tools are needed to develop an accurate analysis.
172p tuanloc_do 03122012 19 1 Download