Xem 1-20 trên 67 kết quả The set tag
  • Ngày nay với sự phát triển của công nghệ,một ứng dụng có thể được tách thành nhiều phần (layer) và chúng được chạy ở trên nhiều máy tính khác nhau,chẳng hạn như: tầng giao diện (The presentation/User Interface) ,tầng nghiệp vụ (Business logic/Controller), tầng truy xuất dữ liệu (data access model). Mô hình MVC ra đời để đáp ứng cho những yêu cầu trên,và nó được chọn làm kiến trúc triển khai trên các ứng dụng web application....

    doc52p phantom12137 24-10-2012 130 74   Download

  • Many of the tasks required for semantic tagging of phrases and texts rely on a list of words annotated with some semantic features. We present a method for extracting sentiment-bearing adjectives from WordNet using the Sentiment Tag Extraction Program (STEP). We did 58 STEP runs on unique non-intersecting seed lists drawn from manually annotated list of positive and negative adjectives and evaluated the results against other manually annotated lists. The 58 runs were then collapsed into a single set of 7, 813 unique words. ...

    pdf8p bunthai_1 06-05-2013 20 2   Download

  • We describe an approach to simultaneous tokenization and part-of-speech tagging that is based on separating the closed and open-class items, and focusing on the likelihood of the possible stems of the openclass words. By encoding some basic linguistic information, the machine learning task is simplified, while achieving stateof-the-art tokenization results and competitive POS results, although with a reduced tag set and some evaluation difficulties.

    pdf6p hongdo_1 12-04-2013 20 3   Download

  • We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.

    pdf6p hongdo_1 12-04-2013 20 3   Download

  • The availability of learner corpora, especially those which have been manually error-tagged or shallow-parsed, is still limited. This means that researchers do not have a common development and test set for natural language processing of learner English such as for grammatical error detection. Given this background, we created a novel learner corpus that was manually error-tagged and shallowparsed.

    pdf10p hongdo_1 12-04-2013 21 2   Download

  • This paper introduces a new training set condensation technique designed for mixtures of labeled and unlabeled data. It finds a condensed set of labeled and unlabeled data points, typically smaller than what is obtained using condensed nearest neighbor on the labeled data only, and improves classification accuracy. We evaluate the algorithm on semisupervised part-of-speech tagging and present the best published result on the Wall Street Journal data set.

    pdf5p hongdo_1 12-04-2013 5 2   Download

  • We describe a novel method for the task of unsupervised POS tagging with a dictionary, one that uses integer programming to explicitly search for the smallest model that explains the data, and then uses EM to set parameter values. We evaluate our method on a standard test corpus using different standard tagsets (a 45-tagset as well as a smaller 17-tagset), and show that our approach performs better than existing state-of-the-art systems in both settings.

    pdf9p hongphan_1 14-04-2013 21 2   Download

  • We present a novel approach to parse web search queries for the purpose of automatic tagging of the queries. We will define a set of probabilistic context-free rules, which generates bags (i.e. multi-sets) of words. Using this new type of rule in combination with the traditional probabilistic phrase structure rules, we define a hybrid grammar, which treats each search query as a bag of chunks (i.e. phrases). A hybrid probabilistic parser is used to parse the queries.

    pdf9p hongphan_1 14-04-2013 11 2   Download

  • This paper introduced the main features of the UAM CorpusTool, software for human and semi-automatic annotation of text and images. The demonstration will show how to set up an annotation project, how to annotate text files at multiple annotation levels, how to automatically assign tags to segments matching lexical patterns, and how to perform crosslayer searches of the corpus.

    pdf4p hongphan_1 15-04-2013 12 2   Download

  • Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximum-likelihood estimation (MLE) of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. ...

    pdf8p hongvang_1 16-04-2013 16 2   Download

  • This paper describes our work on building Part-of-Speech (POS) tagger for Bengali. We have use Hidden Markov Model (HMM) and Maximum Entropy (ME) based stochastic taggers. Bengali is a morphologically rich language and our taggers make use of morphological and contextual information of the words. Since only a small labeled training set is available (45,000 words), simple stochastic approach does not yield very good results. In this work, we have studied the effect of using a morphological analyzer to improve the performance of the tagger. ...

    pdf4p hongvang_1 16-04-2013 14 2   Download

  • The limited coverage of lexical-semantic resources is a significant problem for NLP systems which can be alleviated by automatically classifying the unknown words. Supersense tagging assigns unknown nouns one of 26 broad semantic categories used by lexicographers to organise their manual insertion into W ORD N ET. Ciaramita and Johnson (2003) present a tagger which uses synonym set glosses as annotated training examples. We describe an unsupervised approach, based on vector-space similarity, which does not require annotated examples but significantly outperforms their tagger. ...

    pdf8p bunbo_1 17-04-2013 18 2   Download

  • We report an empirical study on the role of syntactic features in building a semisupervised named entity (NE) tagger. Our study addresses two questions: What types of syntactic features are suitable for extracting potential NEs to train a classifier in a semi-supervised setting? How good is the resulting NE classifier on testing instances dissimilar from its training data? Our study shows that constituency and dependency parsing constraints are both suitable features to extract NEs and train the classifier. ...

    pdf4p bunbo_1 17-04-2013 14 2   Download

  • This paper presents a restricted version of Set-Local Multi-Component TAGs Weir, 1988 which retains the strong generative capacity of Tree-Local MultiComponent TAG i.e. produces the same derived structures but has a greater derivational generative capacity i.e. can derive those structures in more ways. This formalism is then applied as a framework for integrating dependency and constituency based linguistic representations.

    pdf8p bunrieu_1 18-04-2013 11 2   Download

  • The GDA (Global Document Annotation) project proposes a tag set which allows machines to automatically infer the underlying semantic/pragmatic structure of documents. Its objectives are to promote development and spread of N L P / A I applications to render GDA-tagged documents versatile and intelligent contents, which should nmtivate W W W (World Wide Web) users to tag their documents as part of content authoring.

    pdf5p bunrieu_1 18-04-2013 10 2   Download

  • For the task of recognizing dialogue acts, we are applying the Transformation-Based Learning (TBL) machine learning algorithm. To circumvent a sparse data problem, we extract values of well-motivated features of utterances, such as speaker direction, punctuation marks, and a new feature, called dialogue act cues, which we find to be more effective than cue phrases and word n-grams in practice.

    pdf7p bunrieu_1 18-04-2013 12 2   Download

  • PCFGs can be accurate, they suffer from vocabulary coverage problems: treebanks are small and lexicons induced from them are limited. The reason for this treebank-centric view in PCFG learning is 3-fold: the English treebank is fairly large and English morphology is fairly simple, so that in English, the treebank does provide mostly adequate lexical coverage1 ; Lexicons enumerate analyses, but don’t provide probabilities for them; and, most importantly, the treebank and the external lexicon are likely to follow different annotation schemas, reflecting different linguistic perspectives.

    pdf9p bunthai_1 06-05-2013 12 2   Download

  • Low interannotator agreement (IAA) is a well-known issue in manual semantic tagging (sense tagging). IAA correlates with the granularity of word senses and they both correlate with the amount of information they give as well as with its reliability.

    pdf11p bunthai_1 06-05-2013 17 2   Download

  • In this paper, we present a new method for learning to finding translations and transliterations on the Web for a given term. The approach involves using a small set of terms and translations to obtain mixed-code snippets from a search engine, and automatically annotating the snippets with tags and features for training a conditional random field model.

    pdf5p nghetay_1 07-04-2013 9 1   Download

  • This paper presents a novel way of improving POS tagging on heterogeneous data. First, two separate models are trained (generalized and domain-specific) from the same data set by controlling lexical items with different document frequencies.

    pdf5p nghetay_1 07-04-2013 13 1   Download

Đồng bộ tài khoản