MUTE: Majority under-sampling technique

An application which operates on an imbalanced dataset loses its classification performance on a minority class, which is rare and important. There are a number of over-sampling techniques, which insert minority instances into a dataset, to adjust the class distribution. Unfortunately, these instances highly affect the computation of generating a classifier. In this paper, a new simple and effective under-sampling called MUTE is proposed. Its strategy is to get rid of noise majority instances which over-lap with minority instances. The removal majority instances are considered based on their safe levels relying on the Safe-Level-SMOTE concept. MUTE not only reduces the classifier construction time because of a downsizing dataset but also improves the prediction rate on a minority class. The experimental results show that MUTE improves F-measure by comparing to SMOTE techniques.

[1]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[2]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[3]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[4]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[5]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[6]  Salvatore J. Stolfo,et al.  Using artificial anomalies to detect unknown and known network intrusions , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[8]  Krung Sinapiromsaran,et al.  SMOUTE:Synthetics Minority Over-sampling and Under-sampling TEchniques for class imbalanced problem , 2010 .

[9]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[12]  Stan Matwin,et al.  Learning When Negative Examples Abound , 1997, ECML.

[13]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[14]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[15]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[16]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[17]  Fredric C. Gey,et al.  The relationship between recall and precision , 1994 .

[18]  Moninder Singh,et al.  Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management , 1996, ICML.

[19]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[20]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[21]  Tom Fawcett,et al.  Combining Data Mining and Machine Learning for Effective User Profiling , 1996, KDD.