论文信息 - Using evolutionary sampling to mine imbalanced data

Using evolutionary sampling to mine imbalanced data

Class imbalance tends to cause inferior performance in data mining learners. Evolutionary sampling is a technique which seeks to counter this problem by using genetic algorithms to evolve a reduced sample of a complete dataset to train a classification model. Evolutionary sampling works to remove noisy and duplicate instances so that the sampled training data will produce a superior classifier. We propose this novel technique as a method to handle severe class imbalance in data mining. This paper presents our research into the the use of evolutionary sampling with C4.5 decision trees and compares the technique's performance with random undersamp ling.

Taghi M. Khoshgoftaar | Ramaswamy Narayanan | D. J. Drown

[1] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[2] Alice E. Smith,et al. Expected Allele Coverage and the Role of Mutation in Genetic Algorithms , 1993, ICGA.

[3] D. J. Newman,et al. UCI Repository of Machine Learning Database , 1998 .

[4] Gary M. Weiss. Mining with rarity: a unifying framework , 2004, SKDD.

[5] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[6] Haym Hirsh,et al. Learning to Predict Rare Events in Event Sequences , 1998, KDD.

[7] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[8] Dr. Alex A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[9] Tom Fawcett,et al. Robust Classification for Imprecise Environments , 2000, Machine Learning.

[10] Hisao Ishibuchi,et al. Multi-objective pattern and feature selection by a genetic algorithm , 2000, GECCO.

[11] Stan Matwin,et al. Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.