论文信息 - MUTE: Majority under-sampling technique

MUTE: Majority under-sampling technique

An application which operates on an imbalanced dataset loses its classification performance on a minority class, which is rare and important. There are a number of over-sampling techniques, which insert minority instances into a dataset, to adjust the class distribution. Unfortunately, these instances highly affect the computation of generating a classifier. In this paper, a new simple and effective under-sampling called MUTE is proposed. Its strategy is to get rid of noise majority instances which over-lap with minority instances. The removal majority instances are considered based on their safe levels relying on the Safe-Level-SMOTE concept. MUTE not only reduces the classifier construction time because of a downsizing dataset but also improves the prediction rate on a minority class. The experimental results show that MUTE improves F-measure by comparing to SMOTE techniques.

[1] William W. Cohen. Fast Effective Rule Induction , 1995, ICML.

[2] Hui Han,et al. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[3] I. Tomek,et al. Two Modifications of CNN , 1976 .

[4] Robert C. Holte,et al. C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[5] Nitesh V. Chawla,et al. Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[6] Salvatore J. Stolfo,et al. Using artificial anomalies to detect unknown and known network intrusions , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7] Chumphol Bunkhumpornpat,et al. Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[8] Krung Sinapiromsaran,et al. SMOUTE:Synthetics Minority Over-sampling and Under-sampling TEchniques for class imbalanced problem , 2010 .

[9] Nathalie Japkowicz,et al. The Class Imbalance Problem: Significance and Strategies , 2000 .

[10] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[11] Aiko M. Hormann,et al. Programs for Machine Learning. Part I , 1962, Inf. Control..

[12] Stan Matwin,et al. Learning When Negative Examples Abound , 1997, ECML.

[13] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[14] Nitesh V. Chawla,et al. SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[15] Stan Matwin,et al. Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[16] Nathalie Japkowicz,et al. A Novelty Detection Approach to Classification , 1995, IJCAI.

[17] Fredric C. Gey,et al. The relationship between recall and precision , 1994 .

[18] Moninder Singh,et al. Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management , 1996, ICML.

[19] Stan Matwin,et al. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[20] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[21] Tom Fawcett,et al. Combining Data Mining and Machine Learning for Effective User Profiling , 1996, KDD.