![](images/graphics/blank.gif)
Imbalanced data
-
Minority oversampling is a standard approach used for adjusting the ratio between the classes on imbalanced data. However, established methods often provide modest improvements in classification performance when applied to data with extremely imbalanced class distribution and to mixed-type data.
8p
vighostrider
25-05-2023
3
2
Download
-
To predict the characteristics of external causes of road trafc accident (RTA) injuries and mortality, we compared performances based on differences in the correction and classifcation techniques for imbalanced samples.
10p
viferrari
28-11-2022
3
2
Download
-
This sentence-level identification information is used by a teacher network to guide the baseline model’s training by sharing its classifier. Like an instructor, the classifier improves the baseline model’s ability to extract this sentence-level identification information from raw texts, thus benefiting overall performance.
17p
guernsey
28-12-2021
10
0
Download
-
In this paper, we have surveyed some typical facial attribute learning methods. Five major categories of the state-of-the-art methods are identified: (1) Traditional learning, (2) Deep Single Task Learning, (3) Deep Multitask Learning, (4) Imbalanced Data Solver, and (5) Facial Attribute Ontology. They included from traditional learning algorithm to deep learning, along with methods that assist in solving semantic gaps based on ontology and solving data imbalances. For each algorithm of category, basic theories as well as their strengths, weaknesses, and differences are discussed.
20p
angicungduoc11
18-04-2021
26
1
Download
-
In this paper, we propose a novel solution to this problem by using generative adversarial networks to generate synthesized attack data for IDS. The synthesized attacks are merged with the original data to form the augmented dataset. Three popular machine learning techniques are trained on the augmented dataset.
13p
nguaconbaynhay11
16-04-2021
23
2
Download
-
The paper depicts complete study about the second method with some proposed algorithms. It focuses mainly on binary classification with kNN and SVM for imbalanced data. Experiments and comparisons among related methods will confirm pros and coin of each method with respect to performance accuracy and time consumption.
20p
viguam2711
11-01-2021
10
2
Download
-
The wealth of gene expression values being generated by high throughput microarray technologies leads to complex high dimensional datasets. Moreover, many cohorts have the problem of imbalanced classes where the number of patients belonging to each class is not the same.
10p
viwyoming2711
16-12-2020
15
1
Download
-
This paper presents a data classification problem and methods to improve imbalanced data classification. Especially, biomedical data has a very high imbalance rate and the sample identification of minority class is a very important. Many studies have shown that border elements are important in imbalanced data classification such as Borderline-SMOTE, Random Under Border Sampling.
10p
tamynhan9
02-12-2020
14
2
Download
-
Aptamer-protein interacting pairs play a variety of physiological functions and therapeutic potentials in organisms. Rapidly and effectively predicting aptamer-protein interacting pairs is significant to design aptamers binding to certain interested proteins, which will give insight into understanding mechanisms of aptamer-protein interacting pairs and developing aptamer-based therapies.
13p
vioklahoma2711
19-11-2020
14
2
Download
-
The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features).
17p
vioklahoma2711
19-11-2020
10
0
Download
-
The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization.
18p
vioklahoma2711
19-11-2020
10
0
Download
-
This study proposed a hybrid machine learning model which is based on k-nearest neighbors (KNN) and Bayesian optimization (BO), named as BOKNN, for predicting the local damages of reinforced concrete (RC) panels under missile impact loading. In the proposed BO-KNN, the hyperparameters of the KNN were optimized by using the BO which is a wellestablished optimization algorithm. Accordingly, the KNN was trained on an experimental dataset that consists of 254 impact tests to predict four levels (or classes) of damages including perforation, scabbing, penetration, and no damage.
14p
cothumenhmong8
04-11-2020
14
2
Download
-
Mass spectra are usually acquired from the Liquid Chromatography-Mass Spectrometry (LC-MS) analysis for isotope labeled proteomics experiments. In such experiments, the mass profiles of labeled (heavy) and unlabeled (light) peptide pairs are represented by isotope clusters (2D or 3D) that provide valuable information about the studied biological samples in different conditions.
12p
vicolorado2711
23-10-2020
9
1
Download
-
Feature selection in class-imbalance learning has gained increasing attention in recent years due to the massive growth of high-dimensional class-imbalanced data across many scientific fields. In addition to reducing model complexity and discovering key biomarkers, feature selection is also an effective method of combating overlapping which may arise in such data and become a crucial aspect for determining classification performance.
14p
vicolorado2711
22-10-2020
21
2
Download
-
In this paper, in order to increase the accuracy of the prediction model in imbalanced data classification problem, we propose a new cluster-based sampling method to address this work. Performing tests on a number of datasets, we have achieved important results when compared to cases without using any data balancing strategies and previous method.
9p
koxih_kothogmih5
04-09-2020
22
3
Download
-
In this paper, we present an overview of the imbalanced data classification and the difficulties encountered in current approaches, from which we propose a new method, SMOTE-PLS. To evaluate the effectiveness of this new method, we conducted experiments based on standard cancer data sets from UCI sources, including breast-p, coil2000, leukemia, colon-cancer, and yeast.
9p
koxih_kothogmih5
04-09-2020
4
1
Download
-
One of the main reasons for choosing ARC is for its superior ability at handling imbalanced class distributions. It utilizes the association rule mining, making sampling unnecessary in many cases otherwise requiring sampling. In [WZYY05], ARC has been shown to produce the best result among many algorithms on the data set used for KDD- 98 [Kdd98], which has a skewed class distribution. In addition, ARC can handle high dimensionality (the data set has more than 400 variables) without a considerably long running time.
34p
lenh_hoi_xung
21-02-2013
97
5
Download
CHỦ ĐỀ BẠN MUỐN TÌM
![](images/graphics/blank.gif)