Frequency distribution models tuned to words and other linguistic events can predict the number of distinct types and their frequency distribution in samples of arbitrary sizes. We conduct, for the ﬁrst time, a rigorous evaluation of these models based on cross-validation and separation of training and test data. Our experiments reveal that the prediction accuracy of the models is marred by serious overﬁtting problems, due to violations of the random sampling assumption in corpus data. We then propose a simple pre-processing method to alleviate such non-randomness problems. ...
We introduce the zipfR package, a powerful and user-friendly open-source tool for LNRE modeling of word frequency distributions in the R statistical environment. We give some background on LNRE models, discuss related software and the motivation for the toolkit, describe the implementation, and conclude with a complete sample session showing a typical LNRE analysis.
When you have completed this chapter, you will be able to: Organize raw data into frequency distribution; produce a histogram, a frequency polygon, and a cumulative frequency polygon from quantitative data; develop and interpret a stem-and-leaf display; present qualitative data using such graphical techniques such as a clustered bar chart, a stacked bar chart, and a pie chart; detect graphic deceptions and use a graph to present data with clarity, precision, and efficiency.
Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Algorithms for Blind Components Separation and Extraction from the Time-Frequency Distribution of Their Mixture
Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article A Class of Highly Concentrated Time-Frequency Distributions Based on the Ambiguity Domain Representation and Complex-Lag Mome
Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Review Article Techniques to Obtain Good Resolution and Concentrated Time-Frequency Distributions: A Review
Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article Computationally Efﬁcient Scale Covariant Time-Frequency Distributions
We present a language-independent and unsupervised algorithm for the segmentation of words into morphs. The algorithm is based on a new generative probabilistic model, which makes use of relevant prior information on the length and frequency distributions of morphs in a language. Our algorithm is shown to outperform two competing algorithms, when evaluated on data from a language with agglutinative morphology (Finnish), and to perform well also on English data.
A stochastic model based on insights of Mandelbrot (1953) and Simon (1955) is discussed against the background of new criteria of adequacy that have become available recently as a result of studies of the similarity relations between words as found in large computerized text corpora.
We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy between those distributions is used as the similarity measure for clustering. Clusters are represented by average context distributions derived from the given words according to their probabilities of cluster membership.
Chapter 2 - Describing data: Frequency tables, frequency distributions, and graphic presentation. After completing this unit, you should be able to: Organize qualitative data into a frequency table, present a frequency table as a bar chart or a pie chart, organize quantitative data into a frequency distribution, present a frequency distribution for quantitative data using histograms, frequency polygons, and cumulative frequency polygons.
This paper quantitatively investigates in how far local context is useful to disambiguate the senses of an ambiguous word. This is done by comparing the co-occurrence frequencies of particular context words. First, one context word representing a certain sense is chosen, and then the co-occurrence frequencies with two other context words, one of the same and one of another sense, are compared. As expected, it turns out that context words belonging to the same sense have considerably higher co-occurrence frequencies than words belonging to different senses. ...
This paper presents results from experiments in automatic classiﬁcation of animacy for Norwegian nouns using decision-tree classiﬁers. The method makes use of relative frequency measures for linguistically motivated morphosyntactic features extracted from an automatically annotated corpus of Norwegian. The classiﬁers are evaluated using leave-oneout training and testing and the initial results are promising (approaching 90% accuracy) for high frequency nouns, however deteriorate gradually as lower frequency nouns are classiﬁed.
Chapter 2 - Describing data: Frequency tables, frequency distributions, and graphic presentation. When you have completed this chapter, you will be able to: Organize qualitative data into a frequency table, present a frequency table as a bar chart or a pie chart, organize quantitative data into a frequency distribution, present a frequency distribution for quantitative data using histograms, frequency polygons, and cumulative frequency polygons.