Tuyển sinh 2024 dành cho Gen-Z

Trang Chủ

» Công Nghệ Thông Tin

» Kỹ thuật lập trình

Tokenizer automatically

Xem 1-12 trên 12 kết quả Tokenizer automatically

OHYEAH at VLSP2022-EVJVQA challenge: A jointly language image model for multilingual visual question answering

In this paper, we propose applying a jointly developed model to the task of multilingual visual question answering. Specifically, we conduct experiments on a multimodal sequence-to-sequence transformer model derived from the T5 encoder-decoder architecture. Text tokens and Vision Transformer (ViT) dense image embeddings are inputs to an encoder then we used a decoder to automatically anticipate discrete text tokens.

11p dianmotminh02 03-05-2024 6 1 Download

Lecture Compiler construction: Lesson 5 - Sohail Aslam

Lecture Compiler construction: Lesson 5 - Sohail Aslam. The main topics covered in this chapter include: lexical analysis, recall front-end, ad-hoc lexer, hand-write code to generate tokens, look-ahead required to decide where one token ends and the next token begins,...

34p youzhangjing_1909 28-04-2022 9 2 Download

Báo cáo khoa học: "A PROBABILISTIC APPROACH TO GRAMMATICAL ANALYSIS OF WRITTEN ENGLISH BY COMPUTER"

Work at the Unit for Computer Research on the Eaglish Language at the University of Lancaster has been directed towards producing a grammatically snnotated version of the Lancaster-Oslo/ Bergen (LOB) Corpus of written British English texts as the prel~minary stage in developing computer programs and data files for providing a grammatical analysis of -n~estricted English text. From 1981-83, a suite of PASCAL programs was devised to automatically produce a single level of grammatical description with one word tag representing the word class or part of speech of each word token in the corpus.

7p buncha_1 08-05-2013 43 1 Download
Báo cáo khoa học: "A platform for collaborative semantic annotation"

Data-driven approaches in computational semantics are not common because there are only few semantically annotated resources available. We are building a large corpus of public-domain English texts and annotate them semi-automatically with syntactic structures (derivations in Combinatory Categorial Grammar) and semantic representations (Discourse Representation Structures), including events, thematic roles, named entities, anaphora, scope, and rhetorical structure. We have created a wiki-like Web-based platform on which a crowd of expert annotators (i.e.

5p bunthai_1 06-05-2013 50 2 Download
Báo cáo khoa học: "Word Sense Induction for Novel Sense Detection"

We apply topic modelling to automatically induce word senses of a target word, and demonstrate that our word sense induction method can be used to automatically detect words with emergent novel senses, as well as token occurrences of those senses. We start by exploring the utility of standard topic models for word sense induction (WSI), with a pre-determined number of topics (=senses). We next demonstrate that a non-parametric formulation that learns an appropriate number of senses per word actually performs better at the WSI task. ...

11p bunthai_1 06-05-2013 46 4 Download
Báo cáo khoa học: "Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations"

This paper examines the extent to which verb diathesis alternations are empirically attested in corpus data. We automatically acquire alternating verbs from large balanced corpora by using partialparsing methods and taxonomic information, and discuss how corpus data can be used to quantify linguistic generalizations. We estimate the productivity of an alternation and the typicality of its members using type and token frequencies.

8p bunrieu_1 18-04-2013 40 4 Download
Báo cáo khoa học: "Interpreting Semantic Relations in Noun Compounds via Verb Semantics"

We propose a novel method for automatically interpreting compound nouns based on a predeﬁned set of semantic relations. First we map verb tokens in sentential contexts to a ﬁxed set of seed verbs using WordNet::Similarity and Moby’s Thesaurus. We then match the sentences with semantic relations based on the semantics of the seed verbs and grammatical roles of the head noun and modiﬁer. Based on the semantics of the matched sentences, we then build a classiﬁer using TiMBL.

8p hongvang_1 16-04-2013 47 1 Download
Báo cáo khoa học: "Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation"

In statistical language modeling, one technique to reduce the problematic eﬀects of data sparsity is to partition the vocabulary into equivalence classes. In this paper we investigate the eﬀects of applying such a technique to higherorder n-gram models trained on large corpora.

8p hongphan_1 15-04-2013 46 1 Download
Báo cáo khoa học: "Semi-Supervised Active Learning for Sequence Labeling"

While Active Learning (AL) has already been shown to markedly reduce the annotation efforts for many sequence labeling tasks compared to random selection, AL remains unconcerned about the internal structure of the selected sequences (typically, sentences). We propose a semisupervised AL approach for sequence labeling where only highly uncertain subsequences are presented to human annotators, while all others in the selected sequences are automatically labeled.

9p hongphan_1 14-04-2013 64 2 Download
Báo cáo khoa học: "SMS based Interface for FAQ Retrieval"

Short Messaging Service (SMS) is popularly used to provide information access to people on the move. This has resulted in the growth of SMS based Question Answering (QA) services. However automatically handling SMS questions poses signiﬁcant challenges due to the inherent noise in SMS questions. In this work we present an automatic FAQ-based question answering system for SMS users. We handle the noise in a SMS query by formulating the query similarity over FAQ questions as a combinatorial search problem.

9p hongphan_1 14-04-2013 56 2 Download
Báo cáo khoa học: "Extracting and Classifying Urdu Multiword Expressions"

This paper describes a method for automatically extracting and classifying multiword expressions (MWEs) for Urdu on the basis of a relatively small unannotated corpus (around 8.12 million tokens). The MWEs are extracted by an unsupervised method and classiﬁed into two distinct classes, namely locations and person names. The classiﬁcation is based on simple heuristics that take the co-occurrence of MWEs with distinct postpositions into account.

6p hongdo_1 12-04-2013 51 2 Download
Báo cáo khoa học: "Large-Scale Syntactic Language Modeling with Treelets"

We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours.

10p nghetay_1 07-04-2013 41 2 Download

CHỦ ĐỀ BẠN MUỐN TÌM

TOP DOWNLOAD

CEO.29: Bộ Tài Liệu Hệ Thống Quản Trị Doanh Nghiệp

628 tài liệu

859 lượt tải

TL.01: Bộ Tiểu Luận Triết Học

207 tài liệu

1446 lượt tải

EXAM.06: Bộ 240 Đề Thi Thử THPT Quốc Gia 2023

240 tài liệu

1104 lượt tải

THÔNG TIN

TRỢ GIÚP

HỖ TRỢ KHÁCH HÀNG

Theo dõi chúng tôi

Chịu trách nhiệm nội dung:

Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA

LIÊN HỆ

Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM

Hotline: 093 303 0098

Email: support@tailieu.vn

Giấy phép Mạng Xã Hội số: 670/GP-BTTTT cấp ngày 30/11/2015 Copyright © 2022-2032 TaiLieu.VN. All rights reserved.