Clustering techniques

Xem 1-20 trên 45 kết quả Clustering techniques
  • Focusing on multi-document personal name disambiguation, this paper develops an agglomerative clustering approach to resolving this problem. We start from an analysis of pointwise mutual information between feature and the ambiguous name, which brings about a novel weight computing method for feature in clustering. Then a trade-off measure between within-cluster compactness and among-cluster separation is proposed for stopping clustering. After that, we apply a labeling method to find representative feature for each cluster. ...

    pdf8p hongphan_1 15-04-2013 22 1   Download

  • This paper presents an exploratory data analysis in lexical acquisition for adjective classes using clustering techniques. From a theoretical point of view, this approach provides large-scale empirical evidence for a sound classification. From a computational point of view, it helps develop a reliable automatic subclassification method. Results show that the features used in theoretical work can be successfully modelled in terms of shallow cues.

    pdf8p bunthai_1 06-05-2013 23 1   Download

  • In this paper we present TroFi (Trope Finder), a system for automatically classifying literal and nonliteral usages of verbs through nearly unsupervised word-sense disambiguation and clustering techniques. TroFi uses sentential context instead of selectional constraint violations or paths in semantic hierarchies. It also uses literal and nonliteral seed sets acquired and cleaned without human supervision in order to bootstrap learning.

    pdf8p bunthai_1 06-05-2013 22 1   Download

  • We propose a system which builds, in a semi-supervised manner, a resource that aims at helping a NER system to annotate corpus-specific named entities. This system is based on a distributional approach which uses syntactic dependencies for measuring similarities between named entities. The specificity of the presented method however, is to combine a clique-based approach and a clustering technique that amounts to a soft clustering method.

    pdf9p bunthai_1 06-05-2013 14 1   Download

  • Nowadays, huge amount of multimedia data are being constantly generated in various forms from various places around the world. With ever increasing complexity and variability of multimedia data, traditional rule-based approaches where humans have to discover the domain knowledge and encode it into a set of programming rules are too costly and incompetent for analyzing the contents, and gaining the intelligence of this glut of multimedia data. The challenges in data complexity and variability have led to revolutions in machine learning techniques.

    pdf0p hotmoingay 03-01-2013 30 6   Download

  • This paper explores techniques to take advantage of the fundamental difference in structure between hidden Markov models (HMM) and hierarchical hidden Markov models (HHMM). The HHMM structure allows repeated parts of the model to be merged together. A merged model takes advantage of the recurring patterns within the hierarchy, and the clusters that exist in some sequences of observations, in order to increase the extraction accuracy.

    pdf8p hongvang_1 16-04-2013 23 6   Download

  • One of the major problems of K-means is that one must use dense vectors for its centroids, and therefore it is infeasible to store such huge vectors in memory when the feature space is high-dimensional. We address this issue by using feature hashing (Weinberger et al., 2009), a dimension-reduction technique, which can reduce the size of dense vectors while retaining sparsity of sparse vectors.

    pdf5p hongdo_1 12-04-2013 29 5   Download

  • Creates nested clusters Agglomerative clustering algorithms vary in terms of how the proximity of two clusters are computed MIN (single link): susceptible to noise/outliers MAX/GROUP AVERAGE: may not work well with non-globular clusters CURE algorithm tries to handle both problems Often starts with a proximity matrix A type of graph-based algorithm

    ppt37p trinh02 18-01-2013 22 4   Download

  • We present a technique for automatic induction of slot annotations for subcategorization frames, based on induction of hidden classes in the EM framework of statistical estimation. The models are empirically evalutated by a general decision test. Induction of slot labeling for subcategorization frames is accomplished by a further application of EM, and applied experimentally on frame observations derived from parsing large corpora. We outline an interpretation of the learned representations as theoretical-linguistic decompositional lexical entries. ...

    pdf8p bunrieu_1 18-04-2013 24 2   Download

  • Statistical machine learning methods are employed to train a Named Entity Recognizer from annotated data. Methods like Maximum Entropy and Conditional Random Fields make use of features for the training purpose. These methods tend to overfit when the available training corpus is limited especially if the number of features is large or the number of values for a feature is large. To overcome this we proposed two techniques for feature reduction based on word clustering and selection.

    pdf8p hongphan_1 15-04-2013 14 1   Download

  • In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes. In this paper we investigate the effects of applying such a technique to higherorder n-gram models trained on large corpora.

    pdf8p hongphan_1 15-04-2013 19 1   Download

  • This paper presents a hybrid approach to question answering in the clinical domain that combines techniques from summarization and information retrieval. We tackle a frequently-occurring class of questions that takes the form “What is the best drug treatment for X?” Starting from an initial set of MEDLINE citations, our system first identifies the drugs under study. Abstracts are then clustered using semantic classes from the UMLS ontology. Finally, a short extractive summary is generated for each abstract to populate the clusters. ...

    pdf8p hongvang_1 16-04-2013 23 1   Download

  • We present a clustering algorithm for Arabic words sharing the same root. Root based clusters can substitute dictionaries in indexing for IR. Modifying Adamson and Boreham (1974), our Two-stage algorithm applies light stemming before calculating word pair similarity coefficients using techniques sensitive to Arabic morphology. Tests show a successful treatment of infixes and accurate clustering to up to 94.06% for unedited Arabic text samples, without the use of dictionaries.

    pdf8p bunrieu_1 18-04-2013 15 1   Download

  • In this paper we present a method to group adjectives according to their meaning, as a first step towards the automatic identification of adjectival scales. We discuss the properties of adjectival scales and of groups of semantically related adjectives and how they imply sources of linguistic knowledge in text corpora. We describe how our system exploits this linguistic knowledge to compute a measure of similarity between two adjectives, using statistical techniques and without having access to any semantic information about the adjectives. ...

    pdf11p bunmoc_1 20-04-2013 28 1   Download

  • Semantic clusters of a domain form an important feature that can be useful for performing syntactic and semantic disambiguation. Several attempts have been made to extract the semantic clusters of a domain by probabilistic or taxonomic techniques. However, not much progress has been made in evaluating the obtained semantic clusters. This paper focuses on an evaluation mechanism that can be used to evaluate semantic clusters produced by a system against those provided by human experts.

    pdf3p bunmoc_1 20-04-2013 22 1   Download

  • When you have completed this chapter, you will be able to: Organize raw data into frequency distribution; produce a histogram, a frequency polygon, and a cumulative frequency polygon from quantitative data; develop and interpret a stem-and-leaf display; present qualitative data using such graphical techniques such as a clustered bar chart, a stacked bar chart, and a pie chart; detect graphic deceptions and use a graph to present data with clarity, precision, and efficiency.

    ppt68p tangtuy09 21-04-2016 10 1   Download

  • Cluster analysis is an unsupervised technique of grouping related objects without considering their label or class. The objects belonging to the same cluster are relatively more homogeneous in comparison with other clusters. The application of cluster analysis is in areas like gene expression analysis, galaxy formation, natural language processing and image segmentation etc.

    pdf12p dieutringuyen 07-06-2017 2 1   Download

  • Cluster Analysis is a technique for classifying data, i.e to divide the given data into a set of classes or clusters.

    pdf0p ledung 13-03-2009 120 34   Download

  • The need for more rigorous and systematic research in public administration has grown as the complexity of problems in government and nonprofit organizations has increased. This book describes and explains the use of research methods that will strengthen the research efforts of those solving government and nonprofit problems. This book is aimed primarily at those studying research methods in masters and doctoral level courses in curricula that concern the public and nonprofit sector.

    pdf673p hyperion75 15-01-2013 42 9   Download

  • The use of ethanol for fuel was widespread in Europe and the United States until the early 1900s (Illinois Corn Growers’ Association/Illinois Corn Marketing Board). Because it became more expensive to produce than petroleum-based fuel, especially after World War II, ethanol’s potential was largely ignored until the Arab oil embargo of the 1970s. One response to the embargo was increased use of the fuel extender “gasohol ” (or E-10), a mixture of one part ethanol made from corn mixed with nine parts gasoline.

    pdf509p loixinloi 08-05-2013 38 9   Download


Đồng bộ tài khoản