Large training datasets.

Xem 1-12 trên 12 kết quả Large training datasets.

Machine learning-based colorectal cancer prediction using global dietary data

Colorectal cancer (CRC) is the third most commonly diagnosed cancer worldwide. Active health screening for CRC yielded detection of an increasingly younger adults. However, current machine learning algorithms that are trained using older adults and smaller datasets, may not perform well in practice for large populations.

13p vischultz 20-10-2023 1 1 Download

Biologically relevant transfer learning improves transcription factor binding prediction

Deep learning has proven to be a powerful technique for transcription factor (TF) binding prediction but requires large training datasets. Transfer learning can reduce the amount of data required for deep learning, while improving overall model performance, compared to training a separate model for each new task.

25p viarchimedes 26-01-2022 12 0 Download

A hybrid model using the pre trained bert and deep neural networks with rich feature for extractive text summarization

The pretrained BERT multilingual model is used to generate embedding vectors from the input text. These vectors are combined with TF-IDF values to produce the input of the text summarization system. Redundant sentences from the output summary are eliminated by the Maximal Marginal Relevance method. Our system is evaluated with both English and Vietnamese languages using CNN and Baomoi datasets, respectively. Experimental results show that our system achieves better results compared to existing works using the same dataset.

21p spiritedaway36 25-11-2021 7 1 Download
Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach

Protein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached.

18p vioklahoma2711 19-11-2020 11 2 Download
Feature selection and replacement by clustering attributes

Feature selection is to find useful and relevant features from an original feature space to effectively represent and index a given dataset. It is very important for classification and clustering problems, which may be quite difficult to solve when the amount of attributes in a given training data is very large.

9p vititan2711 13-08-2019 14 1 Download
Improving bottleneck features for Vietnamese large vocabulary continuous speech recognition system using deep neural networks

In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to be good models for initializing bottleneck networks of Vietnamese speech recognition system that result in better recognition performance compared to base bottleneck features reported previously. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV).

10p thuyliebe 04-10-2018 27 0 Download
Báo cáo khoa học: "Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets"

Creating large amounts of annotated data to train statistical PCFG parsers is expensive, and the performance of such parsers declines when training and test data are taken from different domains. In this paper we use selftraining in order to improve the quality of a parser and to adapt it to a different domain, using only small amounts of manually annotated seed data. We report signiﬁcant improvement both when the seed and test data are in the same domain and in the outof-domain adaptation scenario. ...

8p hongvang_1 16-04-2013 45 2 Download
Báo cáo khoa học: "A Scalable Probabilistic Classiﬁer for Language Modeling"

We present a novel probabilistic classiﬁer, which scales well to problems that involve a large number of classes and require training on large datasets. A prominent example of such a problem is language modeling. Our classiﬁer is based on the assumption that each feature is associated with a predictive strength, which quantiﬁes how well the feature can predict the class by itself. The predictions of individual features can then be combined according to their predictive strength, resulting in a model, whose parameters can be reliably and efﬁciently estimated.

6p hongdo_1 12-04-2013 44 3 Download
Báo cáo khoa học: "An exponential translation model for target language morphology"

This paper presents an exponential model for translation into highly inﬂected languages which can be scaled to very large datasets. As in other recent proposals, it predicts targetside phrases and can be conditioned on sourceside context. However, crucially for the task of modeling morphological generalizations, it estimates feature parameters from the entire training set rather than as a collection of separate classiﬁers.

9p hongdo_1 12-04-2013 49 3 Download
Báo cáo khoa học: "A Taxonomy, Dataset, and Classiﬁer for Automatic Noun Compound Interpretation"

The automatic interpretation of noun-noun compounds is an important subproblem within many natural language processing applications and is an area of increasing interest. The problem is difﬁcult, with disagreement regarding the number and nature of the relations, low inter-annotator agreement, and limited annotated data. In this paper, we present a novel taxonomy of relations that integrates previous relations, the largest publicly-available annotated dataset, and a supervised classiﬁcation method for automatic noun compound interpretation.

10p hongdo_1 12-04-2013 58 1 Download
Báo cáo khoa học: "Maximum Expected BLEU Training of Phrase and Lexicon Translation Models"

This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset.

10p nghetay_1 07-04-2013 34 2 Download
Báo cáo khoa học: "Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection"

We present a joint model for Chinese word segmentation and new word detection. We present high dimensional new features, including word-based features and enriched edge (label-transition) features, for the joint modeling. As we know, training a word segmentation system on large-scale datasets is already costly.

10p nghetay_1 07-04-2013 47 1 Download