intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Tokenizer automatically

Xem 1-12 trên 12 kết quả Tokenizer automatically
  • In this paper, we propose applying a jointly developed model to the task of multilingual visual question answering. Specifically, we conduct experiments on a multimodal sequence-to-sequence transformer model derived from the T5 encoder-decoder architecture. Text tokens and Vision Transformer (ViT) dense image embeddings are inputs to an encoder then we used a decoder to automatically anticipate discrete text tokens.

    pdf11p dianmotminh02 03-05-2024 6 1   Download

  • Lecture Compiler construction: Lesson 5 - Sohail Aslam. The main topics covered in this chapter include: lexical analysis, recall front-end, ad-hoc lexer, hand-write code to generate tokens, look-ahead required to decide where one token ends and the next token begins,...

    ppt34p youzhangjing_1909 28-04-2022 9 2   Download

  • Work at the Unit for Computer Research on the Eaglish Language at the University of Lancaster has been directed towards producing a grammatically snnotated version of the Lancaster-Oslo/ Bergen (LOB) Corpus of written British English texts as the prel~minary stage in developing computer programs and data files for providing a grammatical analysis of -n~estricted English text. From 1981-83, a suite of PASCAL programs was devised to automatically produce a single level of grammatical description with one word tag representing the word class or part of speech of each word token in the corpus.

    pdf7p buncha_1 08-05-2013 43 1   Download

  • Data-driven approaches in computational semantics are not common because there are only few semantically annotated resources available. We are building a large corpus of public-domain English texts and annotate them semi-automatically with syntactic structures (derivations in Combinatory Categorial Grammar) and semantic representations (Discourse Representation Structures), including events, thematic roles, named entities, anaphora, scope, and rhetorical structure. We have created a wiki-like Web-based platform on which a crowd of expert annotators (i.e.

    pdf5p bunthai_1 06-05-2013 50 2   Download

  • We apply topic modelling to automatically induce word senses of a target word, and demonstrate that our word sense induction method can be used to automatically detect words with emergent novel senses, as well as token occurrences of those senses. We start by exploring the utility of standard topic models for word sense induction (WSI), with a pre-determined number of topics (=senses). We next demonstrate that a non-parametric formulation that learns an appropriate number of senses per word actually performs better at the WSI task. ...

    pdf11p bunthai_1 06-05-2013 46 4   Download

  • This paper examines the extent to which verb diathesis alternations are empirically attested in corpus data. We automatically acquire alternating verbs from large balanced corpora by using partialparsing methods and taxonomic information, and discuss how corpus data can be used to quantify linguistic generalizations. We estimate the productivity of an alternation and the typicality of its members using type and token frequencies.

    pdf8p bunrieu_1 18-04-2013 40 4   Download

  • We propose a novel method for automatically interpreting compound nouns based on a predefined set of semantic relations. First we map verb tokens in sentential contexts to a fixed set of seed verbs using WordNet::Similarity and Moby’s Thesaurus. We then match the sentences with semantic relations based on the semantics of the seed verbs and grammatical roles of the head noun and modifier. Based on the semantics of the matched sentences, we then build a classifier using TiMBL.

    pdf8p hongvang_1 16-04-2013 47 1   Download

  • In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes. In this paper we investigate the effects of applying such a technique to higherorder n-gram models trained on large corpora.

    pdf8p hongphan_1 15-04-2013 46 1   Download

  • While Active Learning (AL) has already been shown to markedly reduce the annotation efforts for many sequence labeling tasks compared to random selection, AL remains unconcerned about the internal structure of the selected sequences (typically, sentences). We propose a semisupervised AL approach for sequence labeling where only highly uncertain subsequences are presented to human annotators, while all others in the selected sequences are automatically labeled.

    pdf9p hongphan_1 14-04-2013 64 2   Download

  • Short Messaging Service (SMS) is popularly used to provide information access to people on the move. This has resulted in the growth of SMS based Question Answering (QA) services. However automatically handling SMS questions poses significant challenges due to the inherent noise in SMS questions. In this work we present an automatic FAQ-based question answering system for SMS users. We handle the noise in a SMS query by formulating the query similarity over FAQ questions as a combinatorial search problem.

    pdf9p hongphan_1 14-04-2013 56 2   Download

  • This paper describes a method for automatically extracting and classifying multiword expressions (MWEs) for Urdu on the basis of a relatively small unannotated corpus (around 8.12 million tokens). The MWEs are extracted by an unsupervised method and classified into two distinct classes, namely locations and person names. The classification is based on simple heuristics that take the co-occurrence of MWEs with distinct postpositions into account.

    pdf6p hongdo_1 12-04-2013 51 2   Download

  • We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours.

    pdf10p nghetay_1 07-04-2013 41 2   Download

CHỦ ĐỀ BẠN MUỐN TÌM

TOP DOWNLOAD
207 tài liệu
1446 lượt tải
ADSENSE

nocache searchPhinxDoc

 

Đồng bộ tài khoản
2=>2