Corpus in japanese

Xem 1-14 trên 14 kết quả Corpus in japanese
  • This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and describes how to tag a large spontaneous speech corpus accurately by using the two methods. The first method is used to detect any type of word segments. The second method is used when there are several definitions for word segments and their POS categories, and when one type of word segments includes another type of word segments.

    pdf10p bunbo_1 17-04-2013 32 1   Download

  • This paper describes an approach to extract the aspectual information of Japanese verb phrases from a monolingual corpus. We classify Verbs into six categories by means of the aspectual features which are defined on the basis of the possibility of co-occurrence with aspectual forms and adverbs. A unique category could be identified for 96% of the target verbs. To evaluate the result of the experiment, we examined the meaning of -leiru which is one of the most fundamental aspectual markers in Japanese, and obtained the correct recognition score of 71% for the 200 sentences. ...

    pdf8p bunthai_1 06-05-2013 38 4   Download

  • Conventional sentence compression methods employ a syntactic parser to compress a sentence without changing its meaning. However, the reference compressions made by humans do not always retain the syntactic structures of the original sentences. Moreover, for the goal of ondemand sentence compression, the time spent in the parsing stage is not negligible.

    pdf8p hongphan_1 14-04-2013 32 2   Download

  • We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We first used a method based on cross-language information retrieval (CLIR) to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. However, the results included many incorrect alignments.

    pdf8p bunbo_1 17-04-2013 44 2   Download

  • Named entity (NE) recognition is a task in which proper nouns and numerical information in a document are detected and classified into categories such as person, organization, location, and date. NE recognition plays an essential role in information extraction systems and question answering systems. It is well known that hand-crafted systems with a large set of heuristic rules are difficult to maintain, and corpus-based statistical approaches are expected to be more robust and require less human intervention. ...

    pdf8p bunrieu_1 18-04-2013 19 2   Download

  • We describe an algorithm for Japanese analysis that does both base phrase chunking and dependency parsing simultaneously in linear-time with a single scan of a sentence. In this paper, we show a pseudo code of the algorithm and evaluate its performance empirically on the Kyoto University Corpus. Experimental results show that the proposed algorithm with the voted perceptron yields reasonably good accuracy.

    pdf4p hongphan_1 15-04-2013 22 1   Download

  • This paper reports the corpus-oriented development of a wide-coverage Japanese HPSG parser. We first created an HPSG treebank from the EDR corpus by using heuristic conversion rules, and then extracted lexical entries from the treebank. The grammar developed using this method attained wide coverage that could hardly be obtained by conventional manual development. We also trained a statistical parser for the grammar on the treebank, and evaluated the parser in terms of the accuracy of semantic-role identification and dependency analysis. ...

    pdf6p bunbo_1 17-04-2013 26 1   Download

  • Multilingual applications frequently involve dealing with proper names, but names are often missing in bilingual lexicons. This problem is exacerbated for applications involving translation between Latin-scripted languages and Asian languages such as Chinese, Japanese and Korean (CJK) where simple string copying is not a solution. We present a novel approach for generating the ideographic representations of a CJK name written in a Latin script.

    pdf8p bunbo_1 17-04-2013 33 1   Download

  • This paper describes a method of detecting grammatical and lexical errors made by Japanese learners of English and other techniques that improve the accuracy of error detection with a limited amount of training data. In this paper, we demonstrate to what extent the proposed methods hold promise by conducting experiments using our learner corpus, which contains information on learners’ errors.

    pdf4p bunbo_1 17-04-2013 34 1   Download

  • It is implemented in Prolog,ln the interests of rapid prototyping, but intended for later optimization. For development purposes we are using an existing corpus of i0,000 words of continuous prose from the PERQ's graphics documentation; in the long term,the system will be extended for use by technical writers in fields other than software, and possibly to other languages.

    pdf5p buncha_1 08-05-2013 34 1   Download

  • This paper describes our preliminary attempt to automatically recognize zero adnominals, a subgroup of zero pronouns, in Japanese discourse. Based on the corpus study, we define and classify what we call “argument-taking nouns (ATNs),” i.e., nouns that can appear with zero adnominals. We propose an ATN recognition algorithm that consists of lexicon-based heuristics, drawn from the observations of our analysis. We finally present the result of the algorithm evaluation and discuss future directions. sion (Halliday and Hasan, 1976). ...

    pdf8p bunbo_1 17-04-2013 37 1   Download

  • This paper describes novel and practical Japanese parsers that uses decision trees. First, we construct a single decision tree to estimate modification probabilities; how one phrase tends to modify another. Next, we introduce a boosting algorithm in which several decision trees are constructed and then combined for probability estimation. The two constructed parsers are evaluated by using the EDR Japanese annotated corpus. The single-tree method outperforms the conventional .Japanese stochastic methods by 4%. ...

    pdf7p bunrieu_1 18-04-2013 21 5   Download

  • This paper describes an alternative translation model based on a text chunk under the framework of statistical machine translation. The translation model suggested here first performs chunking. Then, each word in a chunk is translated. Finally, translated chunks are reordered. Under this scenario of translation modeling, we have experimented on a broadcoverage Japanese-English traveling corpus and achieved improved performance.

    pdf8p bunbo_1 17-04-2013 29 2   Download

  • Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their translations in English inside a pair of parentheses. We present a method to extract such translations from a large collection of web documents by building a partially parallel corpus and use a word alignment algorithm to identify the terms being translated. The method is able to generalize across the translations for different terms and can reliably extract translations that occurred only once in the entire web. ...

    pdf9p hongphan_1 15-04-2013 28 1   Download



p_strKeyword=Corpus in japanese

nocache searchPhinxDoc


Đồng bộ tài khoản