Xem 1-20 trên 61 kết quả Clustering algorithms
  • This paper discusses two problems that arise in the Generation of Referring Expressions: (a) numeric-valued attributes, such as size or location; (b) perspective-taking in reference. Both problems, it is argued, can be resolved if some structure is imposed on the available knowledge prior to content determination. We describe a clustering algorithm which is sufficiently general to be applied to these diverse problems, discuss its application, and evaluate its performance. close’ on the given dimension, and ‘sufficiently distant’ from those of their distractors. ...

    pdf8p bunthai_1 06-05-2013 21 2   Download

  • We present a clustering algorithm for Arabic words sharing the same root. Root based clusters can substitute dictionaries in indexing for IR. Modifying Adamson and Boreham (1974), our Two-stage algorithm applies light stemming before calculating word pair similarity coefficients using techniques sensitive to Arabic morphology. Tests show a successful treatment of infixes and accurate clustering to up to 94.06% for unedited Arabic text samples, without the use of dictionaries.

    pdf8p bunrieu_1 18-04-2013 15 1   Download

  • Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article An Energy Consumption Optimized Clustering Algorithm for Radar Sensor Networks Based on an Ant Colony Algorithm

    pdf7p sting07 17-02-2012 18 4   Download

  • Tuyển tập các báo cáo nghiên cứu về hóa học được đăng trên tạp chí sinh học đề tài : Improvement for detection of microcalcifications through clustering algorithms and artificial neural networks

    pdf11p dauphong11 06-02-2012 29 3   Download

  • Tuyển tập các báo cáo nghiên cứu về sinh học được đăng trên tạp chí sinh học Journal of Biology đề tài: Research Article A New-Fangled FES-k -Means Clustering Algorithm for Disease Discovery and Visual Analytics

    pdf15p dauphong15 16-02-2012 17 3   Download

  • Coreferencing entities across documents in a large corpus enables advanced document understanding tasks such as question answering. This paper presents a novel cross document coreference approach that leverages the profiles of entities which are constructed by using information extraction tools and reconciled by using a within-document coreference module. We propose to match the profiles by using a learned ensemble distance function comprised of a suite of similarity specialists.

    pdf9p hongphan_1 14-04-2013 14 2   Download

  • Our paper reports an attempt to apply an unsupervised clustering algorithm to a Hungarian treebank in order to obtain semantic verb classes. Starting from the hypothesis that semantic metapredicates underlie verbs’ syntactic realization, we investigate how one can obtain semantically motivated verb classes by automatic means. The 150 most frequent Hungarian verbs were clustered on the basis of their complementation patterns, yielding a set of basic classes and hints about the features that determine verbal subcategorization. ...

    pdf6p hongvang_1 16-04-2013 16 2   Download

  • To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chisquared distances between clauses were explored.

    pdf9p bunthai_1 06-05-2013 11 2   Download

  • In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes. In this paper we investigate the effects of applying such a technique to higherorder n-gram models trained on large corpora.

    pdf8p hongphan_1 15-04-2013 17 1   Download

  • The EM clustering algorithm (Hofmann and Puzicha, 1998) used here is an unsupervised machine learning algorithm that has been applied in many NLP tasks, such as inducing a semantically labeled lexicon and determining lexical choice in machine translation (Rooth et al., 1998), automatic acquisition of verb semantic classes (Schulte im Walde, 2000) and automatic semantic labeling (Gildea and Jurafsky, 2002).

    pdf8p bunbo_1 17-04-2013 17 1   Download

  • As digital libraries and the World Wide Web (WWW) continue to grow exponentially, the ability to find useful information will greatly depend on the associated underlying framework of the indexing infrastructure or search engine. The push to get information on-line must be mediated by the design of automated techniques for extracting that information for a variety of users and needs. What algorithms and software environments are plausible for achieving both accuracy and speed in text searching today?...

    pdf200p camchuong_1 10-12-2012 22 7   Download

  • Creates nested clusters Agglomerative clustering algorithms vary in terms of how the proximity of two clusters are computed MIN (single link): susceptible to noise/outliers MAX/GROUP AVERAGE: may not work well with non-globular clusters CURE algorithm tries to handle both problems Often starts with a proximity matrix A type of graph-based algorithm

    ppt37p trinh02 18-01-2013 19 4   Download

  • We revisit the algorithm of Schütze (1995) for unsupervised part-of-speech tagging. The algorithm uses reduced-rank singular value decomposition followed by clustering to extract latent features from context distributions. As implemented here, it achieves state-of-the-art tagging accuracy at considerably less cost than more recent methods.

    pdf5p hongdo_1 12-04-2013 16 3   Download

  • In this paper we describe an unsupervised method for semantic role induction which holds promise for relieving the data acquisition bottleneck associated with supervised role labelers. We present an algorithm that iteratively splits and merges clusters representing semantic roles, thereby leading from an initial clustering to a final clustering of better quality.

    pdf10p hongdo_1 12-04-2013 15 2   Download

  • We combine multiple word representations based on semantic clusters extracted from the (Brown et al., 1992) algorithm and syntactic clusters obtained from the Berkeley parser (Petrov et al., 2006) in order to improve discriminative dependency parsing in the MSTParser framework (McDonald et al., 2005).

    pdf5p hongdo_1 12-04-2013 8 2   Download

  • We present a simple and scalable algorithm for clustering tens of millions of phrases and use the resulting clusters as features in discriminative classifiers. To demonstrate the power and generality of this approach, we apply the method in two very different applications: named entity recognition and query classification. Our results show that phrase clusters offer significant improvements over word clusters. Our NER system achieves the best current result on the widely used CoNLL benchmark.

    pdf9p hongphan_1 14-04-2013 9 2   Download

  • We present a novel framework for the discovery and representation of general semantic relationships that hold between lexical items. We propose that each such relationship can be identified with a cluster of patterns that captures this relationship. We give a fully unsupervised algorithm for pattern cluster discovery, which searches, clusters and merges highfrequency words-based patterns around randomly selected hook words. Pattern clusters can be used to extract instances of the corresponding relationships. ...

    pdf9p hongphan_1 15-04-2013 16 2   Download

  • Effectively identifying events in unstructured text is a very difficult task. This is largely due to the fact that an individual event can be expressed by several sentences. In this paper, we investigate the use of clustering methods for the task of grouping the text spans in a news article that refer to the same event. The key idea is to cluster the sentences, using a novel distance metric that exploits regularities in the sequential structure of events within a document.

    pdf6p hongvang_1 16-04-2013 8 2   Download

  • In this paper, we explore the power of randomized algorithm to address the challenge of working with very large amounts of data. We apply these algorithms to generate noun similarity lists from 70 million pages. We reduce the running time from quadratic to practically linear in the number of elements to be computed.

    pdf8p bunbo_1 17-04-2013 12 2   Download

  • We address the problem of clustering words (or constructing a thesaurus) based on co-occurrence data, and using the acquired word classes to improve the accuracy of syntactic disambiguation. We view this problem as that of estimating a joint probability distribution specifying the joint probabilities of word pairs, such as noun verb pairs. We propose an efficient algorithm based on the Minimum Description Length (MDL) principle for estimating such a probability distribution. Our method is a natural extension of those proposed in (Brown et al.

    pdf7p bunrieu_1 18-04-2013 10 2   Download

Đồng bộ tài khoản