zunia.vn

Tuyển sinh 2024 dành cho Gen-Z

zunia.vn

» Khoa Học Tự Nhiên

» Sinh học

String similarity

Xem 1-20 trên 29 kết quả String similarity

Linear space string correction algorithm using the Damerau-Levenshtein distance

The Damerau-Levenshtein (DL) distance metric has been widely used in the biological science. It tries to identify the similar region of DNA,RNA and protein sequences by transforming one sequence to the another using the substitution, insertion, deletion and transposition operations.

21p viwyoming2711 16-12-2020 22 1 Download

The number of reduced alignments between two DNA sequences

In this study we consider DNA sequences as mathematical strings. Total and reduced alignments between two DNA sequences have been considered in the literature to measure their similarity. Results for explicit representations of some alignments have been already obtained.

5p vikentucky2711 26-11-2020 7 1 Download

A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction

Molecular structures can be represented as strings of special characters using SMILES. Since each molecule is represented as a string, the similarity between compounds can be computed using SMILES-based string similarity functions. Most previous studies on drug-target interaction prediction use 2D-based compound similarity kernels such as SIMCOMP.

11p vioklahoma2711 19-11-2020 10 2 Download
String kernels for protein sequence comparisons: Improved fold recognition

The amino acid sequence of a protein is the blueprint from which its structure and ultimately function can be derived. Therefore, sequence comparison methods remain essential for the determination of similarity between proteins.

15p vioklahoma2711 19-11-2020 11 0 Download
A weighted string kernel for protein fold recognition

Alignment-free methods for comparing protein sequences have proved to be viable alternatives to approaches that first rely on an alignment of the sequences to be compared. Much work however need to be done before those methods provide reliable fold recognition for proteins whose sequences share little similarity.

14p viflorida2711 30-10-2020 13 1 Download
Detection of long non–coding RNA homology, a comparative study on alignment and alignment-free metrics

Long non-coding RNAs (lncRNAs) represent a novel class of non-coding RNAs having a crucial role in many biological processes. The identification of long non-coding homologs among different species is essential to investigate such roles in model organisms as homologous genes tend to retain similar molecular and biological functions.

12p vicoachella2711 27-10-2020 12 0 Download
Optimising the under-reamer string design for wells at Hai Thach field, Nam Con Son basin

According to the drilling program approved for Hai Thach field, the drilling section below the 16” casing liner (14.85” internal diameter) will be carried out by two separate BHAs: first drilling the 12.25” section by PDC bit to the section target, then under-reaming the wellbore to 14.5” and 16.5” diameter in order to run 13.625” casing string. Using two separate BHAs for reaming the wellbore certainly leads to a time increase in the run in hole (RIH) and pull out of the hole (POOH) of the drill-string and hence the associated costs such as rig and other related third party services.

8p kequaidan6 10-07-2020 13 0 Download
Lecture Mastering C# - Chapter 12: Working with String

Describes how strings are a first-class type in the CLR and how to use them effectively in C#. A large portion of the chapter covers the string-formatting capabilities of various types in the .NET Framework and how to make your defined types behave similarly by implementing IFormattable.

52p tangtuy20 28-07-2016 33 2 Download
Báo cáo khoa học: "User Edits Classiﬁcation Using Document Revision Histories"

Document revision histories are a useful and abundant source of data for natural language processing, but selecting relevant data for the task at hand is not trivial. In this paper we introduce a scalable approach for automatically distinguishing between factual and ﬂuency edits in document revision histories.

11p bunthai_1 06-05-2013 56 2 Download
Báo cáo khoa học: "Combining Clues for Word Alignment"

In this paper, a word alignment approach is presented which is based on a combination of clues. Word alignment clues indicate associations between words and phrases. They can be based on features such as frequency, part-of-speech, phrase type, and the actual wordform strings. Clues can be found by calculating similarity measures or learned from word aligned data. The clue alignment approach, which is proposed in this paper, makes it possible to combine association clues taking different kinds of linguistic information into account. ...

8p bunthai_1 06-05-2013 50 3 Download
Báo cáo khoa học: "Cohesion and Collocation: Using Context Vectors in Text Segmentation"

Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the V e c T i l e system, produces similarity curves over texts using pre-compiled vector representations of the contextual behavior of words. The performance of this system is shown to improve over that of the purely string-based TextTiling algorithm (Hearst, 1997). 1 Background ...

5p bunrieu_1 18-04-2013 56 3 Download
Báo cáo khoa học: "Adaptive String Distance Measures for Bilingual Dialect Lexicon Induction"

This paper compares different measures of graphemic similarity applied to the task of bilingual lexicon induction between a Swiss German dialect and Standard German. The measures have been adapted to this particular language pair by training stochastic transducers with the ExpectationMaximisation algorithm or by using handmade transduction rules. These adaptive metrics show up to 11% F-measure improvement over a static metric like Levenshtein distance.

6p hongvang_1 16-04-2013 43 3 Download
Báo cáo khoa học: "Bootstrapping a Stochastic Transducer for Arabic-English Transliteration Extraction"

We propose a bootstrapping approach to training a memoriless stochastic transducer for the task of extracting transliterations from an English-Arabic bitext. The transducer learns its similarity metric from the data in the bitext, and thus can function directly on strings written in different writing scripts without any additional language knowledge. We show that this bootstrapped transducer performs as well or better than a model designed speciﬁcally to detect Arabic-English transliterations. ...

8p hongvang_1 16-04-2013 46 1 Download
Báo cáo khoa học: "Alignment-Based Discriminative String Similarity"

A character-based measure of similarity is an important component of many natural language processing systems, including approaches to transliteration, coreference, word alignment, spelling correction, and the identiﬁcation of cognates in related vocabularies. We propose an alignment-based discriminative framework for string similarity. We gather features from substring pairs consistent with a character-based alignment of the two strings.

8p hongvang_1 16-04-2013 49 2 Download
Báo cáo khoa học: "N Semantic Classes are Harder than Two"

We show that we can automatically classify semantically related phrases into 10 classes. Classiﬁcation robustness is improved by training with multiple sources of evidence, including within-document cooccurrence, HTML markup, syntactic relationships in sentences, substitutability in query logs, and string similarity. Our work provides a benchmark for automatic n-way classiﬁcation into WordNet’s semantic classes, both on a TREC news corpus and on a corpus of substitutable search query phrases. ...

8p hongvang_1 16-04-2013 51 1 Download
Báo cáo khoa học: "Paraphrase Recognition Using Machine Learning to Combine Similarity Measures"

This paper presents three methods that can be used to recognize paraphrases. They all employ string similarity measures applied to shallow abstractions of the input sentences, and a Maximum Entropy classiﬁer to learn how to combine the resulting features. Two of the methods also exploit WordNet to detect synonyms and one of them also exploits a dependency parser. We experiment on two datasets, the MSR paraphrasing corpus and a dataset that we automatically created from the MTC corpus. Our system achieves state of the art or better results. ...

9p hongphan_1 15-04-2013 50 1 Download
Báo cáo khoa học: "Arabic Cross-Document Coreference Detection"

We describe a set of techniques for Arabic cross-document coreference resolution. We compare a baseline system of exact mention string-matching to ones that include local mention context information as well as information from an existing machine translation system. It turns out that the machine translation-based technique outperforms the baseline, but local entity context similarity does not. This helps to point the way for future crossdocument coreference work in languages with few existing resources for the task. cross-document coreference in Arabic as there is in English (e.g.

4p hongphan_1 15-04-2013 38 2 Download
Báo cáo khoa học: "Using Derivation Trees for Treebank Error Detection"

This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection. ...

6p hongdo_1 12-04-2013 41 2 Download
Báo cáo khoa học: "Improving Decoding Generalization for Tree-to-String Translation"

To address the parse error issue for tree-tostring translation, this paper proposes a similarity-based decoding generation (SDG) solution by reconstructing similar source parse trees for decoding at the decoding time instead of taking multiple source parse trees as input for decoding. Experiments on Chinese-English translation demonstrated that our approach can achieve a significant improvement over the standard method, and has little impact on decoding speed in practice.

6p hongdo_1 12-04-2013 40 2 Download
Báo cáo khoa học: "A Fast and Accurate Method for Approximate String Search"

This paper proposes a new method for approximate string search, speciﬁcally candidate generation in spelling error correction, which is a task as follows. Given a misspelled word, the system ﬁnds words in a dictionary, which are most “similar” to the misspelled word. The paper proposes a probabilistic approach to the task, which is both accurate and efﬁcient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for ﬁnding the top k candidates. ...

10p hongdo_1 12-04-2013 49 4 Download

+

Xem thêm 29 String similarity khác

CHỦ ĐỀ BẠN MUỐN TÌM

TOP DOWNLOAD

CEO.29: Bộ Tài Liệu Hệ Thống Quản Trị Doanh Nghiệp

628 tài liệu

859 lượt tải

TL.01: Bộ Tiểu Luận Triết Học

207 tài liệu

1446 lượt tải

EXAM.04: Bộ 290+ Đề Thi Vào Lớp 10 Năm 2020

290 tài liệu

508 lượt tải

THÔNG TIN

TRỢ GIÚP

HỖ TRỢ KHÁCH HÀNG

Theo dõi chúng tôi

Chịu trách nhiệm nội dung:

Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA

LIÊN HỆ

Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM

Hotline: 093 303 0098

Email: support@tailieu.vn

Giấy phép Mạng Xã Hội số: 670/GP-BTTTT cấp ngày 30/11/2015 Copyright © 2022-2032 TaiLieu.VN. All rights reserved.