Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities.
We investigate the relevance of hierarchical topic models to represent the content of Web gists. We focus our attention on DMOZ, a popular Web directory, and propose two algorithms to infer such a model from its manually-curated hierarchy of categories. Our ﬁrst approach, based on information-theoretic grounds, uses an algorithm similar to recursive feature selection. Our second approach is fully Bayesian and derived from the more general model, hierarchical LDA.
Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model.
A solution to the problem of homograph (words with multiple distinct meanings) identification is proposed and evaluated in this paper. It is demonstrated that a mixture model based framework is better suited for this task than the standard classification algorithms – relative improvement of 7% in F1 measure and 14% in Cohen’s kappa score is observed.
This book was conceived as a result of many years research with students
and postdocs in molecular simulation, and shaped over several courses on
the subject given at the University of Groningen, the Eidgen¨ossische Technische
Hochschule (ETH) in Z¨urich, the University of Cambridge, UK, the
University of Rome (La Sapienza), and the University of North Carolina
at Chapel Hill, NC, USA.
Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article Bayesian Hierarchical Model for Estimating Gene Expression Intensity Using Multiple Scanned Microarrays
Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành y học dành cho các bạn tham khảo đề tài: Dimensional and hierarchical models of depression using the Beck Depression Inventory-II in an Arab college student sample...
Methods that measure compatibility between mention pairs are currently the dominant approach to coreference. However, they suffer from a number of drawbacks including difﬁculties scaling to large numbers of mentions and limited representational power. As these drawbacks become increasingly restrictive, the need to replace the pairwise approaches with a more expressive, highly scalable alternative is becoming urgent.
Department of Cognitive and Linguistic Sciences Brown University Providence, RI, USA
correction, the approximation is poor for hierarchical models, which are commonly used for NLP applications. We derive an improved O(1) formula that gives exact values for the expected counts in non-hierarchical models. For hierarchical models, where our formula is not exact, we present an efﬁcient method for sampling from the HDP (and related models, such as the hierarchical PitmanYor process) that considerably decreases the memory footprint of such models as compared to the naive implementation. ...
We describe two probabilistic models for unsupervised word-sense disambiguation using parallel corpora. The ﬁrst model, which we call the Sense model, builds on the work of Diab and Resnik (2002) that uses both parallel text and a sense inventory for the target language, and recasts their approach in a probabilistic framework. The second model, which we call the Concept model, is a hierarchical model that uses a concept latent variable to relate different language speciﬁc sense labels.
We present a document compression system that uses a hierarchical noisy-channel model of text production. Our compression system ﬁrst automatically derives the syntactic structure of each sentence and the overall discourse structure of the text given as input. The system then uses a statistical hierarchical model of text production in order to drop non-important syntactic and discourse constituents so as to generate coherent, grammatical document compressions of arbitrary length.
Appendix E - Hierarchical model. This chapter presents the following content: Basic concepts, tree-structure diagrams, data-retrieval facility, update facility, virtual records, mapping of hierarchies to files, the IMS database system.
Mô hình CSDL là phương thức biểu diễn các dữ liệu, giúp cho việc tổ chức các dữ liệu thuận tiện cho việc thiết kế, lưu trữ, xử lý
Mô hình CSDL dùng mô hình toán học để mô tả CSDL dựa trên các tập hợp và phép toán
Các mô hình CSDL phổ biến:
Mô hình phân lớp (hierarchical model)
Mô hình mạng (network model)
Mô hình quan hệ (relational model)
Mô hình thực thể-liên kết (entity-relationship model)
Mô hình hướng đối tượng (object model)
Mô hình XML – bán cấu trúc (semi-stru...
Data Model: A set of concepts to describe the structure of a database, and certain constraints that the database should obey.
Data Model Operations: Operations for specifying database retrievals and updates by referring to the concepts of the data model. Operations on the data model may include basic operations and user-defined operations.
The relational database uses the concept of linked two-dimensional tables consisting
of rows and columns, as shown in Figure 1-2. Unlike the hierarchical approach, no
predetermined relationship exists between distinct tables. This means that the data
needed to link together the different areas of the network or hierarchical model need
not be defined. Because relational users don’t need to understand the representation
of data in storage to retrieve it (many such users created ad hoc queries against the
data), ease of use helped popularize the relational model....
This paper explores techniques to take advantage of the fundamental difference in structure between hidden Markov models (HMM) and hierarchical hidden Markov models (HHMM). The HHMM structure allows repeated parts of the model to be merged together. A merged model takes advantage of the recurring patterns within the hierarchy, and the clusters that exist in some sequences of observations, in order to increase the extraction accuracy.
We describe a simple variant of the interpolated Markov model with non-emitting state transitions and prove that it is strictly more powerful than any Markov model. Empirical results demonstrate that the non-emitting model outperforms the interpolated model on the Brown corpus and on the Wall Street Journal under a wide range of experimental conditions. The nonemitting model is also much less prone to overtraining. The remainder of our article consists of four sections.
In this work we address the problem of unsupervised part-of-speech induction by bringing together several strands of research into a single model. We develop a novel hidden Markov model incorporating sophisticated smoothing using a hierarchical Pitman-Yor processes prior, providing an elegant and principled means of incorporating lexical characteristics.
Surface realisation decisions in language generation can be sensitive to a language model, but also to decisions of content selection. We therefore propose the joint optimisation of content selection and surface realisation using Hierarchical Reinforcement Learning (HRL). To this end, we suggest a novel reward function that is induced from human data and is especially suited for surface realisation.