![](images/graphics/blank.gif)
Parsing the lob corpus
-
The Constituent Likelihood Automatic Word-tagging System (CLAWS) was originally designed for the low-level grammatical analysis of the million-word LOB Corpus of English text samples. CLAWS does not attempt a full parse, but uses a firat-order Markov model of language to assign word-class labels to words. CLAWS can be modified to detect grammatical errors, essentially by flagging unlikely word-class transitions in the input text.
8p
buncha_1
08-05-2013
42
3
Download
-
The UCREL team at the University of Lancaster is engaged in the development of a robust parsing mechanism, which will assign the appropriate grammatical structure to sentences in unconstrained English text. The techniques used involve the calculation of probabilities for competing structures, and are based on the techniques successfully used in tagging (i.e. assigning grammatical word classes) to the LOB (Lancaster-Oslo/Bergen) corpus.
5p
buncha_1
08-05-2013
40
2
Download
-
This paper 1 presents a rapid and robust parsing system currently used to learn from large bodies of unedited text. The system contains a multivalued part-of-speech disambiguator and a novel parser employing bottom-up recognition to find the constituent phrases of larger structures that might be too difficult to analyze. The results of applying the disambiguator and parser to large sections of the Lancaster/ Oslo-Bergen corpus are presented. INTRODUCTION We have implemented and tested a parsing system which is rapid and robust enough to apply to large bodies of unedited text. ...
9p
bungio_1
03-05-2013
53
1
Download
CHỦ ĐỀ BẠN MUỐN TÌM
![](images/graphics/blank.gif)