![](images/graphics/blank.gif)
Large training datasets.
-
Colorectal cancer (CRC) is the third most commonly diagnosed cancer worldwide. Active health screening for CRC yielded detection of an increasingly younger adults. However, current machine learning algorithms that are trained using older adults and smaller datasets, may not perform well in practice for large populations.
13p
vischultz
20-10-2023
1
1
Download
-
Deep learning has proven to be a powerful technique for transcription factor (TF) binding prediction but requires large training datasets. Transfer learning can reduce the amount of data required for deep learning, while improving overall model performance, compared to training a separate model for each new task.
25p
viarchimedes
26-01-2022
12
0
Download
-
The pretrained BERT multilingual model is used to generate embedding vectors from the input text. These vectors are combined with TF-IDF values to produce the input of the text summarization system. Redundant sentences from the output summary are eliminated by the Maximal Marginal Relevance method. Our system is evaluated with both English and Vietnamese languages using CNN and Baomoi datasets, respectively. Experimental results show that our system achieves better results compared to existing works using the same dataset.
21p
spiritedaway36
25-11-2021
7
1
Download
-
Protein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached.
18p
vioklahoma2711
19-11-2020
11
2
Download
-
Feature selection is to find useful and relevant features from an original feature space to effectively represent and index a given dataset. It is very important for classification and clustering problems, which may be quite difficult to solve when the amount of attributes in a given training data is very large.
9p
vititan2711
13-08-2019
14
1
Download
-
In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to be good models for initializing bottleneck networks of Vietnamese speech recognition system that result in better recognition performance compared to base bottleneck features reported previously. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV).
10p
thuyliebe
04-10-2018
27
0
Download
-
Creating large amounts of annotated data to train statistical PCFG parsers is expensive, and the performance of such parsers declines when training and test data are taken from different domains. In this paper we use selftraining in order to improve the quality of a parser and to adapt it to a different domain, using only small amounts of manually annotated seed data. We report significant improvement both when the seed and test data are in the same domain and in the outof-domain adaptation scenario. ...
8p
hongvang_1
16-04-2013
45
2
Download
-
We present a novel probabilistic classifier, which scales well to problems that involve a large number of classes and require training on large datasets. A prominent example of such a problem is language modeling. Our classifier is based on the assumption that each feature is associated with a predictive strength, which quantifies how well the feature can predict the class by itself. The predictions of individual features can then be combined according to their predictive strength, resulting in a model, whose parameters can be reliably and efficiently estimated.
6p
hongdo_1
12-04-2013
44
3
Download
-
This paper presents an exponential model for translation into highly inflected languages which can be scaled to very large datasets. As in other recent proposals, it predicts targetside phrases and can be conditioned on sourceside context. However, crucially for the task of modeling morphological generalizations, it estimates feature parameters from the entire training set rather than as a collection of separate classifiers.
9p
hongdo_1
12-04-2013
49
3
Download
-
The automatic interpretation of noun-noun compounds is an important subproblem within many natural language processing applications and is an area of increasing interest. The problem is difficult, with disagreement regarding the number and nature of the relations, low inter-annotator agreement, and limited annotated data. In this paper, we present a novel taxonomy of relations that integrates previous relations, the largest publicly-available annotated dataset, and a supervised classification method for automatic noun compound interpretation.
10p
hongdo_1
12-04-2013
58
1
Download
-
This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset.
10p
nghetay_1
07-04-2013
34
2
Download
-
We present a joint model for Chinese word segmentation and new word detection. We present high dimensional new features, including word-based features and enriched edge (label-transition) features, for the joint modeling. As we know, training a word segmentation system on large-scale datasets is already costly.
10p
nghetay_1
07-04-2013
47
1
Download
CHỦ ĐỀ BẠN MUỐN TÌM
![](images/graphics/blank.gif)