![](images/graphics/blank.gif)
Annotation errors
-
The accurate determination of the genomic coordinates for a given gene – its gene model – is of vital importance to the utility of its annotation, and the accuracy of bioinformatic analyses derived from it. Currentlyavailable methods of computational gene prediction, while on the whole successful, frequently disagree on the model for a given predicted gene, with some or all of the variant gene models often failing to match the biologically observed structure. Many prediction methods can be bolstered by using experimental data such as RNA-seq.
18p
vibeauty
23-10-2021
7
1
Download
-
Although animal mitochondrial DNA sequences are known to evolve rapidly, their gene arrangements often remain unchanged over long periods of evolutionary time. Therefore, comparisons of mitochondrial genomes may result in significant insights into the evolution both of organisms and of genomes.
8p
vigiselle2711
30-08-2021
13
1
Download
-
The size of the protein sequence database has been exponentially increasing due to advances in genome sequencing. However, experimentally characterized proteins only constitute a small portion of the database, such that the majority of sequences have been annotated by computational approaches. Current automatic annotation pipelines inevitably introduce errors, making the annotations unreliable.
7p
viwyoming2711
16-12-2020
13
1
Download
-
Cilia are hairlike structures protruding from nearly every cell in the body. Diseases known as ciliopathies where cilia function is disrupted can result in a wide spectrum of diseases. However, most techniques for assessing ciliary motion rely on manual identification and tracking of cilia. This annotation is tedious and error-prone and more analytical techniques impose strong assumptions such as periodic motion of beating cilia.
8p
vigeorgia2711
01-12-2020
9
1
Download
-
With the advance of next generation sequencing (NGS) technologies, a large number of insertion and deletion (indel) variants have been identified in human populations. Despite much research into variant calling, it has been found that a non-negligible proportion of the identified indel variants might be false positives due to sequencing errors, artifacts caused by ambiguous alignments, and annotation errors.
10p
vikentucky2711
26-11-2020
11
0
Download
-
Recent advances in sequencing technologies have led to an explosion in the number of genomes available, but accurate genome annotation remains a major challenge.
16p
vikentucky2711
24-11-2020
11
1
Download
-
The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity.
17p
vioklahoma2711
19-11-2020
6
1
Download
-
Concerning different approaches to automatic PoS tagging: EngCG-2, a constraintbased morphological tagger, is compared in a double-blind test with a state-of-the-art statistical tagger on a common disambiguation task using a common tag set. The experiments show that for the same amount of remaining ambiguity, the error rate of the statistical tagger is one order of magnitude greater than that of the rule-based one. The two related issues of priming effects compromising the results and disagreement between human annotators are also addressed. ...
8p
bunthai_1
06-05-2013
48
3
Download
-
We apply machine learning techniques to classify automatically a set of verbs into lexical semantic classes, based on distributional approximations of diatheses, extracted from a very large annotated corpus. Distributions of four grammatical features are sufficient to reduce error rate by 50% over chance. We conclude that corpus data is a usable repository of verb class information, and that corpus-driven extraction of grammatical features is a promising methodology for automatic lexical acquisition. ...
8p
bunthai_1
06-05-2013
45
3
Download
-
The well-studied supervised Relation Extraction algorithms require training data that is accurate and has good coverage. To obtain such a gold standard, the common practice is to do independent double annotation followed by adjudication. This takes significantly more human effort than annotation done by a single annotator.
10p
bunthai_1
06-05-2013
48
2
Download
-
Hungarian is a stereotype of morphologically rich and non-configurational languages. Here, we introduce results on dependency parsing of Hungarian that employ a 80K, multi-domain, fully manually annotated corpus, the Szeged Dependency Treebank. We show that the results achieved by state-of-the-art data-driven parsers on Hungarian and English (which is at the other end of the configurational-nonconfigurational spectrum) are quite similar to each other in terms of attachment scores.
11p
bunthai_1
06-05-2013
58
3
Download
-
This paper describes POS tagging experiments with semi-supervised training as an extension to the (supervised) averaged perceptron algorithm, first introduced for this task by (Collins, 2002). Experiments with an iterative training on standard-sized supervised (manually annotated) dataset (106 tokens) combined with a relatively modest (in the order of 108 tokens) unsupervised (plain) data in a bagging-like fashion showed significant improvement of the POS classification task on typologically different languages, yielding better than state-of-the-art results for English and Czech (4.
9p
bunthai_1
06-05-2013
37
1
Download
-
The quality of the part-of-speech (PoS) annotation in a corpus is crucial for the development of PoS taggers. In this paper, we experiment with three complementary methods for automatically detecting errors in the PoS annotation for the Icelandic Frequency Dictionary corpus. The first two methods are language independent and we argue that the third method can be adapted to other morphologically complex languages. Once possible errors have been detected, we examine each error candidate and hand-correct the corresponding PoS tag if necessary. ...
9p
bunthai_1
06-05-2013
56
1
Download
-
Building on work detecting errors in dependency annotation, we set out to correct local dependency errors. To do this, we outline the properties of annotation errors that make the task challenging and their existence problematic for learning. For the task, we define a feature-based model that explicitly accounts for non-relations between words, and then use ambiguities from one model to constrain a second, more relaxed model. In this way, we are successfully able to correct many errors, in a way which is potentially applicable to dependency parsing more generally. ...
9p
bunthai_1
06-05-2013
49
2
Download
-
Faced with the problem of annotation errors in part-of-speech (POS) annotated corpora, we develop a method for automatically correcting such errors. Building on top of a successful error detection method, we first try correcting a corpus using two off-the-shelf POS taggers, based on the idea that they enforce consistency; with this, we find some improvement. After some discussion of the tagging process, we alter the tagging model to better account for problematic tagging distinctions. This modification results in significantly improved performance, reducing the error rate of the corpus. ...
8p
bunthai_1
06-05-2013
45
2
Download
-
We propose a new method for detecting errors in "gold-standard" part-ofspeech annotation. The approach locates errors with high precision based on n-grams occurring in the corpus with multiple taggings. Two further techniques, closed-class analysis and finitestate tagging guide patterns, are discussed. The success of the three approaches is illustrated for the Wall Street Journal corpus as part of the Penn Treebank.
8p
bunthai_1
06-05-2013
39
2
Download
-
We use the grammatical relations (GRs) described in Carroll et al. (1998) to compare a number of parsing algorithms A first ranking of the parsers is provided by comparing the extracted GRs to a gold standard GR annotation of 500 Susanne sentences: this required an implementation of GR extraction software for Penn Treebank style parsers. In addition, we perform an experiment using the extracted GRs as input to the Lappin and Leass (1994) anaphora resolution algorithm.
8p
bunthai_1
06-05-2013
56
2
Download
-
Statistical methods require very large corpus with high quality. But building large and faultless annotated corpus is a very difficult job. This paper proposes an efficient m e t h o d to construct part-of-speech tagged corpus. A rulebased error correction m e t h o d is proposed to find and correct errors semi-automatically by user-defined rules. We also make use of user's correction log to reflect feedback. Experiments were carried out to show the efficiency of error correction process of this workbench. The result shows that about 63.2 % of tagging errors can be corrected. ...
5p
bunrieu_1
18-04-2013
46
3
Download
-
We have developed willex, a tool that helps grammar developers to work efficiently by using annotated corpora and recording parsing errors. Willex has two major new functions. First, it decreases ambiguity of the parsing results by comparing them to an annotated corpus and removing wrong partial results both automatically and manually. Second, willex accumulates parsing errors as data for the developers to clarify the defects of the grammar statistically. We applied willex to a large-scale HPSG-style grammar as an example. ...
4p
bunbo_1
17-04-2013
55
1
Download
-
Extracting semantic relationships between entities is challenging because of a paucity of annotated data and the errors induced by entity detection modules. We employ Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text. Our system obtained competitive results in the Automatic Content Extraction (ACE) evaluation. Here we present our general approach and describe our ACE results.
4p
bunbo_1
17-04-2013
49
1
Download
CHỦ ĐỀ BẠN MUỐN TÌM
![](images/graphics/blank.gif)