zunia.vn

Tuyển sinh 2024 dành cho Gen-Z

zunia.vn

» Luận Văn - Báo Cáo

» Báo cáo khoa học

Bilingual corpora

Xem 1-20 trên 59 kết quả Bilingual corpora

Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary

Electronic medical record (EMR) systems have become widely used throughout the world to improve the quality of healthcare and the efficiency of hospital services. A bilingual medical lexicon of Chinese and English is needed to meet the demand for the multi-lingual and multi-national treatment.

10p vikentucky2711 24-11-2020 9 0 Download

Báo cáo khoa học: "Automating the Acquisition of Bilingual Terminology"

As the acquisition problem of bilingual lists of terminological expressions is formidable, it is worthwhile to investigate methods to compile such lists as automatically as possible. In this paper we discuss experimental results for a number of methods, which operate on corpora of previously translated texts. K e y w o r d s : parallel corpora, tagging, terminology acquisition.

7p buncha_1 08-05-2013 50 1 Download

Báo cáo khoa học: "An Alignment Method for Noisy Parallel Corpora based on Image Processing Techniques"

This paper presents a new approach to bitext correspondence problem (BCP) of noisy bilingual corpora based on image processing (IP) techniques. By using one of several ways of estimating the lexical translation probability (LTP) between pairs of source and target words, we can turn a bitext into a discrete gray-level image. We contend that the BCP, when seen in this light, bears a striking resemblance to the line detection problem in IP.

8p bunthai_1 06-05-2013 54 3 Download
Báo cáo khoa học: "Automatically Generated Customizable Online Dictionaries"

The aim of our software presentation is to demonstrate that corpus-driven bilingual dictionaries generated fully by automatic means are suitable for human use. Previous experiments have proven that bilingual lexicons can be created by applying word alignment on parallel corpora. Such an approach, especially the corpus-driven nature of it, yields several advantages over more traditional approaches. Most importantly, automatically attained translation probabilities are able to guarantee that the most frequently used translations come ﬁrst within an entry.

7p bunthai_1 06-05-2013 38 4 Download
Báo cáo khoa học: "Detecting Highly Conﬁdent Word Translations from Comparable Corpora without Any Prior Knowledge"

In this paper, we extend the work on using latent cross-language topic models for identifying word translations across comparable corpora. We present a novel precisionoriented algorithm that relies on per-topic word distributions obtained by the bilingual LDA (BiLDA) latent topic model. The algorithm aims at harvesting only the most probable word translations across languages in a greedy fashion, without any prior knowledge about the language pair, relying on a symmetrization process and the one-to-one constraint.

11p bunthai_1 06-05-2013 55 2 Download
Báo cáo khoa học: "Does more data always yield better translations?"

Nowadays, there are large amounts of data available to train statistical machine translation systems. However, it is not clear whether all the training data actually help or not. A system trained on a subset of such huge bilingual corpora might outperform the use of all the bilingual data. This paper studies such issues by analysing two training data selection techniques: one based on approximating the probability of an indomain corpus; and another based on infrequent n-gram occurrence.

10p bunthai_1 06-05-2013 46 3 Download
Báo cáo khoa học: "Toward Statistical Machine Translation without Parallel Corpora"

We estimate the parameters of a phrasebased statistical machine translation system from monolingual corpora instead of a bilingual parallel corpus. We extend existing research on bilingual lexicon induction to estimate both lexical and phrasal translation probabilities for MT-scale phrasetables. We propose a novel algorithm to estimate reordering probabilities from monolingual data. We report translation results for an end-to-end translation system using these monolingual features alone.

11p bunthai_1 06-05-2013 51 3 Download
Báo cáo khoa học: "Power-Law Distributions for Paraphrases Extracted from Bilingual Corpora"

We describe a novel method that extracts paraphrases from a bitext, for both the source and target languages. In order to reduce the search space, we decompose the phrase-table into sub-phrase-tables and construct separate clusters for source and target phrases. We convert the clusters into graphs, add smoothing/syntacticinformation-carrier vertices, and compute the similarity between phrases with a random walk-based measure, the commute time.

10p bunthai_1 06-05-2013 42 3 Download
Báo cáo khoa học: "Feature-based Method for Document Alignment in Comparable News Corpora"

In this paper, we present a feature-based method to align documents with similar content across two sets of bilingual comparable corpora from daily news texts. We evaluate the contribution of each individual feature and investigate the incorporation of these diverse statistical and heuristic features for the task of bilingual document alignment. Experimental results on the English-Chinese and EnglishMalay comparable news corpora show that our proposed Discrete Fourier Transformbased term frequency distribution feature is very effective. ...

9p bunthai_1 06-05-2013 33 2 Download
Báo cáo khoa học: "Bilingually Motivated Domain-Adapted Word Segmentation for Statistical Machine Translation"

We introduce a word segmentation approach to languages where word boundaries are not orthographically marked, with application to Phrase-Based Statistical Machine Translation (PB-SMT). Instead of using manually segmented monolingual domain-speciﬁc corpora to train segmenters, we make use of bilingual corpora and statistical word alignment techniques. First of all, our approach is adapted for the speciﬁc translation task at hand by taking the corresponding source (target) language into account. ...

9p bunthai_1 06-05-2013 44 2 Download
Báo cáo khoa học: "Using Noisy Bilingual Data for Statistical Machine Translation"

SMT systems rely on sufficient amount of parallel corpora to train the translation model. This paper investigates possibilities to use word-to-word and phrase-to-phrase translations extracted not only from clean parallel corpora but also from noisy comparable corpora. Translation results for a Chinese to English translation task are given.

4p bunthai_1 06-05-2013 36 2 Download
Báo cáo khoa học: "Automatic Construction of Machine Translation Knowledge Using Translation Literalness"

When machine translation (MT) knowledge is automatically constructed from bilingual corpora, redundant rules are acquired due to translation variety. These rules increase ambiguity or cause incorrect MT results. To overcome this problem, we constrain the sentences used for knowledge extraction to "the appropriate bilingual sentences for the MT." In this paper, we propose a method using translation literalness to select appropriate sentences or phrases.

8p bunthai_1 06-05-2013 51 1 Download
Báo cáo khoa học: "Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora"

Within the framework of translation knowledge acquisition from WWW news sites, this paper studies issues on the effect of cross-language retrieval of relevant texts in bilingual lexicon acquisition from comparable corpora. We experimentally show that it is quite effective to reduce the candidate bilingual term pairs against which bilingual term correspondences are estimated, in terms of both computational complexity and the performance of precise estimation of bilingual term correspondences.

8p bunthai_1 06-05-2013 38 1 Download
Báo cáo khoa học: "A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora"

We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/Indo-European language pairs. Tagging information of one language is used. Word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. New anchor point finding and noise elimination techniques are introduced. We obtained a 73.1% precision. We also show how the results can be used in the compilation of domain-specific noun phrases. ...

8p bunmoc_1 20-04-2013 39 2 Download
Báo cáo khoa học: "ALIGNING A PARALLEL ENGLISH-CHINESE CORPUS STATISTICALLY WITH LEXICAL CRITERIA"

We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale ~ Church's (1991) lengthbased statistical method to the task of alignment involving a non-Indo-European language; and (3) an improved statistical method that also incorporates domain-specific lexical cues.

8p bunmoc_1 20-04-2013 32 2 Download
Báo cáo khoa học: "AN ALGORITHM FOR FINDING NOUN PHRASE CORRESPONDENCES IN BILINGUAL CORPORA"

The paper describes an algorithm that employs English and French text taggers to associate noun phrases in an aligned bilingual corpus. The taggets provide part-of-speech categories which are used by finite-state recognizers to extract simple noun phrases for both languages. Noun phrases are then mapped to each other using an iterative re-estimation algorithm that bears similarities to the Baum-Welch algorithm which is used for training the taggers.

6p bunmoc_1 20-04-2013 40 1 Download
Báo cáo khoa học: "ALIGNING SENTENCES IN BILINGUAL CORPORA USING LEXICAL INFORMATION"

In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ignore word identities and only consider sentence length (Brown el al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statistical word-to-word translation model on the fly during alignment. We find the alignment that maximizes the probability of generating the corpus with this translation model.

8p bunmoc_1 20-04-2013 45 2 Download
Báo cáo khoa học: "A PROGRAM FOR ALIGNING SENTENCES IN BILINGUAL CORPORA"

Researchers in both machine Iranslation (e.g., Brown et al., 1990) and bilingual lexicography (e.g., Klavans and Tzoukermann, 1990) have recently become interested in studying parallel texts, texts such as the Canadian Hansards (parliamentary proceedings) which are available in multiple languages (French and English). This paper describes a method for aligning sentences in these parallel texts, based on a simple statistical model of character lengths. The method was developed and tested on a small trilingual sample of Swiss economic reports.

8p bunmoc_1 20-04-2013 56 3 Download
Báo cáo khoa học: " A Decoder for Syntax-based Statistical MT"

A statistical machine translation system based on the noisy channel model consists of three components: a language model (LM), a translation model (TM), and a decoder. For a system which translates from to English , the LM gives a foreign language a prior probability P and the TM gives a channel translation probability P . These models are automatically trained using monolingual (for the LM) and bilingual (for the TM) corpora.

8p bunmoc_1 20-04-2013 41 3 Download
Báo cáo khoa học: "Mining the Web for Bilingual Text"

Text in parallel translation is a valuable resource in natural language processing. Statistical methods in machine translation (e.g. (Brown et al., 1990)) typically rely on large quantities of bilingual text aligned at the document or sentence level, and a number of approaches in the burgeoning field of crosslanguage information retrieval exploit parallel corpora either in place of or in addition to mappings between languages based on information from bilingual dictionaries (Davis and Dunning, 1995; Landauer and Littman, 1990; Hull and Oard, 1997; Oard, 1997). ...

8p bunrieu_1 18-04-2013 41 2 Download

+

Xem thêm 59 Bilingual corpora khác

CHỦ ĐỀ BẠN MUỐN TÌM

TOP DOWNLOAD

LV.26: Bộ 320 Luận Văn Thạc Sĩ Y Học

320 tài liệu

1223 lượt tải

LV.11: Bộ Luận Văn Tốt Nghiệp Chuyên Ngành Tài Chính Ngân Hàng

172 tài liệu

824 lượt tải

LV.09: Bộ Luận Văn Tốt Nghiệp Chuyên Ngành Quản Trị Kinh Doanh

81 tài liệu

1620 lượt tải

THÔNG TIN

TRỢ GIÚP

HỖ TRỢ KHÁCH HÀNG

Theo dõi chúng tôi

Chịu trách nhiệm nội dung:

Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA

LIÊN HỆ

Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM

Hotline: 093 303 0098

Email: support@tailieu.vn

Giấy phép Mạng Xã Hội số: 670/GP-BTTTT cấp ngày 30/11/2015 Copyright © 2022-2032 TaiLieu.VN. All rights reserved.