  • This paper proposes a procedural pipeline for wind forecasting based on clustering and regression. First, the data are clustered into groups sharing similar dynamic properties. Then, data in the same cluster are used to train the neural network that predicts wind speed. For clustering, a hidden Markov model (HMM) and the modified Bayesian information criteria (BIC) are incorporated in a new method of clustering time series data.

  • The turn of the millennium has been described as the dawn of a new scientific revolution, which will have as great an impact on society as the industrial and computer revolutions before. This revolution was heralded by a large-scale DNA sequencing effort in July 1995, when the entire 1.8 million base pairs of the genome of the bacterium Haemophilus influenzae was published – the first of a free-living organism. Since then, the amount of DNA sequence data in publicly accessible data bases has been growing exponentially, including a working draft of the complete 3.

  • We introduce a novel Bayesian approach for deciphering complex substitution ciphers. Our method uses a decipherment model which combines information from letter n-gram language models as well as word dictionaries. Bayesian inference is performed on our model using an efficient sampling technique. We evaluate the quality of the Bayesian decipherment output on simple and homophonic letter substitution ciphers and show that unlike a previous approach, our method consistently produces almost 100% accurate decipherments. ...

  • We present BAYE S UM (for “Bayesian summarization”), a model for sentence extraction in query-focused summarization. BAYE S UM leverages the common case in which multiple documents are relevant to a single query. Using these documents as reinforcement for query terms, BAYE S UM is not afflicted by the paucity of information in short queries. We show that approximate inference in BAYE S UM is possible on large data sets and results in a stateof-the-art summarization system.

  • Most information extraction (IE) systems identify facts that are explicitly stated in text. However, in natural language, some facts are implicit, and identifying them requires “reading between the lines”. Human readers naturally use common sense knowledge to infer such implicit information from the explicitly stated facts.

  • Fisher and Mahalanobis described Statistics as the key technology of the twentieth century. Since then Statistics has evolved into a field that has many applications in all sciences and areas of technology, as well as in most areas of decision making such as in health care, business, federal statistics and legal proceedings. Applications in statistics such as inference for Causal effects, inferences about the spatio- temporal processes, analysis of categorical and survival data sets and countless other functions play an essential role in the present day world.

  • objective or subjective, when making decisions under uncertainty. This is especially true when the consequences of the decisions can have a significant impact, financial or otherwise. Most of us make everyday personal decisions this way, using an intuitive process based on our experience and subjective judgments. Mainstream statistical analysis, however, seeks objectivity by generally restricting the information used in an analysis to that obtained from a current set of clearly relevant data.

  • Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2008, Article ID 317252, 14 pages doi:10.1155/2008/317252 Research Article Nonparametric Bayesian Filtering for Location Estimation, Position Tracking, and Global Localization of Mobile Terminals in Outdoor Wireless Environments Mohamed Khalaf-Allah Institute of Communications Engineering, Faculty of Electrical Engineering and Information Technology, Leibniz University of Hannover, Appelstrasse 9A, 30167 Hannover, Germany Correspondence should be addressed to Mohamed Khalaf-Allah, mohamed.

  • In this work I address the challenge of augmenting n-gram language models according to prior linguistic intuitions. I argue that the family of hierarchical Pitman-Yor language models is an attractive vehicle through which to address the problem, and demonstrate the approach by proposing a model for German compounds. In an empirical evaluation, the model outperforms the Kneser-Ney model in terms of perplexity, and achieves preliminary improvements in English-German translation.

  • Two models were used to apportion human cases to sources on the basis of sequence types: the modified Hald model and the Island model (12,15). The modified Hald model combines the prevalence of each C. jejuni sequence type among the sources with the observed number of human isolates of that type by using a Bayesian framework (15). This model includes source-specific and type-specific factors, and accounts for variation in the estimated prevalence.

  • Evaluating mutual fund performance is a topic of long-standing interest in the academic literature, but few if any studies have addressed the selection of an optimal portfolio of funds. Instead of using the historical data to estimate performance measures or produce fund rank- ings, this study uses the data to explore the mutual-fund investment decision.

  • To deal with the lack of reported information, we propose a novel approach to obtain the exposure contained in the net position in interest-rate derivatives. We specify a state space model of a bank’s derivatives trading strategy. We then use Bayesian methods to estimate the bank’s strategy using the joint distribution of interest rates, bank fair and notional values as well as bid-ask spreads. Intuitively, the identification of the bank’s strategy relies on whether the net position (per dollar notional) gains or loses in value over time, together with the history of rates.

  • CHAPTER 37 OLS With Random Constraint. A Bayesian considers the posterior density the full representation of the information provided by sample and prior information. Frequentists have discoveered that one can interpret the parameters of this density as estimators of the key unknown parameters

  • This study, given its Bayesian approach, is related to the recent article by Baks, Metrick, and Wachter (2001), who estimate funds' alphas using informative prior beliefs about alpha. They investigate the degree to which informative priors can preclude an investor from infer- ring that at least one actively managed fund has a positive alpha. This inference relates to an investment problem of a mutual fund investor who can also earn the hypothetical costless returns on the benchmark indexes.

  • This book addresses state-of-the-art systems and achievements in various topics in the research field of speech and language technologies. Book chapters are organized in different sections covering diverse problems, which have to be solved in speech recognition and language understanding systems. In the first section machine translation systems based on large parallel corpora using rule-based and statistical-based translation methods are presented.

  • We investigate the relevance of hierarchical topic models to represent the content of Web gists. We focus our attention on DMOZ, a popular Web directory, and propose two algorithms to infer such a model from its manually-curated hierarchy of categories. Our first approach, based on information-theoretic grounds, uses an algorithm similar to recursive feature selection. Our second approach is fully Bayesian and derived from the more general model, hierarchical LDA.

  • This section develops an econometric framework that allows an investor to combine in- formation in the data with prior beliefs about both pricing and skill. Nonbenchmark assets allow us to distinguish between pricing and skill, and they supply additional information about funds' expected returns. In addition, nonbenchmark assets help account for common variation in funds' returns, making the investment problem feasible using a large universe of funds.

  • Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. ...

  • Educators are interested in essay evaluation systems that include feedback about writing features that can facilitate the essay revision process. For instance, if the thesis statement of a student’s essay could be automatically identified, the student could then use this information to reflect on the thesis statement with regard to its quality, and its relationship to other discourse elements in the essay. Using a relatively small corpus of manually annotated data, we use Bayesian classification to identify thesis statements.

  • In this work, we develop and evaluate a wide range of feature spaces for deriving Levinstyle verb classifications (Levin, 1993). We perform the classification experiments using Bayesian Multinomial Regression (an efficient log-linear modeling framework which we found to outperform SVMs for this task) with the proposed feature spaces. Our experiments suggest that subcategorization frames are not the most effective features for automatic verb classification. A mixture of syntactic information and lexical information works best for this task. ...

