Imbalanced data

Xem 1-17 trên 17 kết quả Imbalanced data

Synthetic minority oversampling of vital statistics data with generative adversarial networks

Minority oversampling is a standard approach used for adjusting the ratio between the classes on imbalanced data. However, established methods often provide modest improvements in classification performance when applied to data with extremely imbalanced class distribution and to mixed-type data.

8p vighostrider 25-05-2023 3 2 Download

Comparison of mortality prediction models for road traffic accidents: An ensemble technique for imbalanced data

To predict the characteristics of external causes of road trafc accident (RTA) injuries and mortality, we compared performances based on differences in the correction and classifcation techniques for imbalanced samples.

10p viferrari 28-11-2022 3 2 Download

Classifier-adaptation knowledge distillation framework for relation extraction and event detection with imbalanced data

This sentence-level identification information is used by a teacher network to guide the baseline model’s training by sharing its classifier. Like an instructor, the classifier improves the baseline model’s ability to extract this sentence-level identification information from raw texts, thus benefiting overall performance.

17p guernsey 28-12-2021 10 0 Download
An overview of facial attribute learning

In this paper, we have surveyed some typical facial attribute learning methods. Five major categories of the state-of-the-art methods are identified: (1) Traditional learning, (2) Deep Single Task Learning, (3) Deep Multitask Learning, (4) Imbalanced Data Solver, and (5) Facial Attribute Ontology. They included from traditional learning algorithm to deep learning, along with methods that assist in solving semantic gaps based on ontology and solving data imbalances. For each algorithm of category, basic theories as well as their strengths, weaknesses, and differences are discussed.

20p angicungduoc11 18-04-2021 26 1 Download
Handling imbalanced data in intrusion detection systems using generative adversarial networks

In this paper, we propose a novel solution to this problem by using generative adversarial networks to generate synthesized attack data for IDS. The synthesized attacks are merged with the original data to form the augmented dataset. Three popular machine learning techniques are trained on the augmented dataset.

13p nguaconbaynhay11 16-04-2021 23 2 Download
Data balancing methods by fuzzy rough sets

The paper depicts complete study about the second method with some proposed algorithms. It focuses mainly on binary classification with kNN and SVM for imbalanced data. Experiments and comparisons among related methods will confirm pros and coin of each method with respect to performance accuracy and time consumption.

20p viguam2711 11-01-2021 10 2 Download
A balanced iterative random forest for gene selection from microarray data

The wealth of gene expression values being generated by high throughput microarray technologies leads to complex high dimensional datasets. Moreover, many cohorts have the problem of imbalanced classes where the number of patients belonging to each class is not the same.

10p viwyoming2711 16-12-2020 15 1 Download
BAM: Border adjustment method improve the efficiency of imbalanced biological data classification

This paper presents a data classification problem and methods to improve imbalanced data classification. Especially, biomedical data has a very high imbalance rate and the sample identification of minority class is a very important. Many studies have shown that border elements are important in imbalanced data classification such as Borderline-SMOTE, Random Under Border Sampling.

10p tamynhan9 02-12-2020 14 2 Download
Prediction of aptamer-protein interacting pairs using an ensemble classifier in combination with various protein sequence attributes

Aptamer-protein interacting pairs play a variety of physiological functions and therapeutic potentials in organisms. Rapidly and effectively predicting aptamer-protein interacting pairs is significant to design aptamers binding to certain interested proteins, which will give insight into understanding mechanisms of aptamer-protein interacting pairs and developing aptamer-based therapies.

13p vioklahoma2711 19-11-2020 14 2 Download
AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity

The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features).

17p vioklahoma2711 19-11-2020 10 0 Download
CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests

The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization.

18p vioklahoma2711 19-11-2020 10 0 Download
A hybrid model for predicting missile impact damages based on K-nearest neighbors and bayesian optimization

This study proposed a hybrid machine learning model which is based on k-nearest neighbors (KNN) and Bayesian optimization (BO), named as BOKNN, for predicting the local damages of reinforced concrete (RC) panels under missile impact loading. In the proposed BO-KNN, the hyperparameters of the KNN were optimized by using the BO which is a wellestablished optimization algorithm. Accordingly, the KNN was trained on an experimental dataset that consists of 254 impact tests to predict four levels (or classes) of damages including perforation, scabbing, penetration, and no damage.

14p cothumenhmong8 04-11-2020 14 2 Download
Quality control of imbalanced mass spectra from isotopic labeling experiments

Mass spectra are usually acquired from the Liquid Chromatography-Mass Spectrometry (LC-MS) analysis for isotope labeled proteomics experiments. In such experiments, the mass profiles of labeled (heavy) and unlabeled (light) peptide pairs are represented by isotope clusters (2D or 3D) that provide valuable information about the studied biological samples in different conditions.

12p vicolorado2711 23-10-2020 9 1 Download
Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data

Feature selection in class-imbalance learning has gained increasing attention in recent years due to the massive growth of high-dimensional class-imbalanced data across many scientific fields. In addition to reducing model complexity and discovering key biomarkers, feature selection is also an effective method of combating overlapping which may arise in such data and become a crucial aspect for determining classification performance.

14p vicolorado2711 22-10-2020 21 2 Download
A new method based on clustering improves the efficiency of imbalanced data classification

In this paper, in order to increase the accuracy of the prediction model in imbalanced data classification problem, we propose a new cluster-based sampling method to address this work. Performing tests on a number of datasets, we have achieved important results when compared to cases without using any data balancing strategies and previous method.

9p koxih_kothogmih5 04-09-2020 22 3 Download
A new hybrid method to improve the effectiveness of cancer data classification

In this paper, we present an overview of the imbalanced data classification and the difficulties encountered in current approaches, from which we propose a new method, SMOTE-PLS. To evaluate the effectiveness of this new method, we conducted experiments based on standard cancer data sets from UCI sources, including breast-p, coil2000, leukemia, colon-cancer, and yeast.

9p koxih_kothogmih5 04-09-2020 4 1 Download
Customer-Driven Marketing Strategy Creating Value for Target Customers

One of the main reasons for choosing ARC is for its superior ability at handling imbalanced class distributions. It utilizes the association rule mining, making sampling unnecessary in many cases otherwise requiring sampling. In [WZYY05], ARC has been shown to produce the best result among many algorithms on the data set used for KDD- 98 [Kdd98], which has a skewed class distribution. In addition, ARC can handle high dimensionality (the data set has more than 400 variables) without a considerably long running time.

34p lenh_hoi_xung 21-02-2013 97 5 Download