Báo cáo khoa học: "Analysing Wikipedia and Gold-Standard Corpora for NER Training"
39
lượt xem 2
download
lượt xem 2
download
Download
Vui lòng tải xuống để xem tài liệu đầy đủ
Named entity recognition (NER) for English typically involves one of three gold standards: MUC, CoNLL, or BBN, all created by costly manual annotation. Recent work has used Wikipedia to automatically create a massive corpus of named entity annotated text. We present the first comprehensive crosscorpus evaluation of NER. We identify the causes of poor cross-corpus performance and demonstrate ways of making them more compatible. Using our process, we develop a Wikipedia corpus which outperforms gold standard corpora on crosscorpus evaluation by up to 11%. ...
Chủ đề:
Bình luận(0) Đăng nhập để gửi bình luận!
CÓ THỂ BẠN MUỐN DOWNLOAD