intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

String similarity

Xem 1-20 trên 29 kết quả String similarity
  • The Damerau-Levenshtein (DL) distance metric has been widely used in the biological science. It tries to identify the similar region of DNA,RNA and protein sequences by transforming one sequence to the another using the substitution, insertion, deletion and transposition operations.

    pdf21p viwyoming2711 16-12-2020 22 1   Download

  • In this study we consider DNA sequences as mathematical strings. Total and reduced alignments between two DNA sequences have been considered in the literature to measure their similarity. Results for explicit representations of some alignments have been already obtained.

    pdf5p vikentucky2711 26-11-2020 7 1   Download

  • Molecular structures can be represented as strings of special characters using SMILES. Since each molecule is represented as a string, the similarity between compounds can be computed using SMILES-based string similarity functions. Most previous studies on drug-target interaction prediction use 2D-based compound similarity kernels such as SIMCOMP.

    pdf11p vioklahoma2711 19-11-2020 10 2   Download

  • The amino acid sequence of a protein is the blueprint from which its structure and ultimately function can be derived. Therefore, sequence comparison methods remain essential for the determination of similarity between proteins.

    pdf15p vioklahoma2711 19-11-2020 11 0   Download

  • Alignment-free methods for comparing protein sequences have proved to be viable alternatives to approaches that first rely on an alignment of the sequences to be compared. Much work however need to be done before those methods provide reliable fold recognition for proteins whose sequences share little similarity.

    pdf14p viflorida2711 30-10-2020 13 1   Download

  • Long non-coding RNAs (lncRNAs) represent a novel class of non-coding RNAs having a crucial role in many biological processes. The identification of long non-coding homologs among different species is essential to investigate such roles in model organisms as homologous genes tend to retain similar molecular and biological functions.

    pdf12p vicoachella2711 27-10-2020 12 0   Download

  • According to the drilling program approved for Hai Thach field, the drilling section below the 16” casing liner (14.85” internal diameter) will be carried out by two separate BHAs: first drilling the 12.25” section by PDC bit to the section target, then under-reaming the wellbore to 14.5” and 16.5” diameter in order to run 13.625” casing string. Using two separate BHAs for reaming the wellbore certainly leads to a time increase in the run in hole (RIH) and pull out of the hole (POOH) of the drill-string and hence the associated costs such as rig and other related third party services.

    pdf8p kequaidan6 10-07-2020 13 0   Download

  • Describes how strings are a first-class type in the CLR and how to use them effectively in C#. A large portion of the chapter covers the string-formatting capabilities of various types in the .NET Framework and how to make your defined types behave similarly by implementing IFormattable.

    pdf52p tangtuy20 28-07-2016 33 2   Download

  • Document revision histories are a useful and abundant source of data for natural language processing, but selecting relevant data for the task at hand is not trivial. In this paper we introduce a scalable approach for automatically distinguishing between factual and fluency edits in document revision histories.

    pdf11p bunthai_1 06-05-2013 56 2   Download

  • In this paper, a word alignment approach is presented which is based on a combination of clues. Word alignment clues indicate associations between words and phrases. They can be based on features such as frequency, part-of-speech, phrase type, and the actual wordform strings. Clues can be found by calculating similarity measures or learned from word aligned data. The clue alignment approach, which is proposed in this paper, makes it possible to combine association clues taking different kinds of linguistic information into account. ...

    pdf8p bunthai_1 06-05-2013 50 3   Download

  • Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the V e c T i l e system, produces similarity curves over texts using pre-compiled vector representations of the contextual behavior of words. The performance of this system is shown to improve over that of the purely string-based TextTiling algorithm (Hearst, 1997). 1 Background ...

    pdf5p bunrieu_1 18-04-2013 56 3   Download

  • This paper compares different measures of graphemic similarity applied to the task of bilingual lexicon induction between a Swiss German dialect and Standard German. The measures have been adapted to this particular language pair by training stochastic transducers with the ExpectationMaximisation algorithm or by using handmade transduction rules. These adaptive metrics show up to 11% F-measure improvement over a static metric like Levenshtein distance.

    pdf6p hongvang_1 16-04-2013 43 3   Download

  • We propose a bootstrapping approach to training a memoriless stochastic transducer for the task of extracting transliterations from an English-Arabic bitext. The transducer learns its similarity metric from the data in the bitext, and thus can function directly on strings written in different writing scripts without any additional language knowledge. We show that this bootstrapped transducer performs as well or better than a model designed specifically to detect Arabic-English transliterations. ...

    pdf8p hongvang_1 16-04-2013 46 1   Download

  • A character-based measure of similarity is an important component of many natural language processing systems, including approaches to transliteration, coreference, word alignment, spelling correction, and the identification of cognates in related vocabularies. We propose an alignment-based discriminative framework for string similarity. We gather features from substring pairs consistent with a character-based alignment of the two strings.

    pdf8p hongvang_1 16-04-2013 49 2   Download

  • We show that we can automatically classify semantically related phrases into 10 classes. Classification robustness is improved by training with multiple sources of evidence, including within-document cooccurrence, HTML markup, syntactic relationships in sentences, substitutability in query logs, and string similarity. Our work provides a benchmark for automatic n-way classification into WordNet’s semantic classes, both on a TREC news corpus and on a corpus of substitutable search query phrases. ...

    pdf8p hongvang_1 16-04-2013 51 1   Download

  • This paper presents three methods that can be used to recognize paraphrases. They all employ string similarity measures applied to shallow abstractions of the input sentences, and a Maximum Entropy classifier to learn how to combine the resulting features. Two of the methods also exploit WordNet to detect synonyms and one of them also exploits a dependency parser. We experiment on two datasets, the MSR paraphrasing corpus and a dataset that we automatically created from the MTC corpus. Our system achieves state of the art or better results. ...

    pdf9p hongphan_1 15-04-2013 50 1   Download

  • We describe a set of techniques for Arabic cross-document coreference resolution. We compare a baseline system of exact mention string-matching to ones that include local mention context information as well as information from an existing machine translation system. It turns out that the machine translation-based technique outperforms the baseline, but local entity context similarity does not. This helps to point the way for future crossdocument coreference work in languages with few existing resources for the task. cross-document coreference in Arabic as there is in English (e.g.

    pdf4p hongphan_1 15-04-2013 38 2   Download

  • This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection. ...

    pdf6p hongdo_1 12-04-2013 41 2   Download

  • To address the parse error issue for tree-tostring translation, this paper proposes a similarity-based decoding generation (SDG) solution by reconstructing similar source parse trees for decoding at the decoding time instead of taking multiple source parse trees as input for decoding. Experiments on Chinese-English translation demonstrated that our approach can achieve a significant improvement over the standard method, and has little impact on decoding speed in practice.

    pdf6p hongdo_1 12-04-2013 40 2   Download

  • This paper proposes a new method for approximate string search, specifically candidate generation in spelling error correction, which is a task as follows. Given a misspelled word, the system finds words in a dictionary, which are most “similar” to the misspelled word. The paper proposes a probabilistic approach to the task, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for finding the top k candidates. ...

    pdf10p hongdo_1 12-04-2013 49 4   Download

CHỦ ĐỀ BẠN MUỐN TÌM

TOP DOWNLOAD
207 tài liệu
1446 lượt tải
ADSENSE

nocache searchPhinxDoc

 

Đồng bộ tài khoản
2=>2