Classifying Severely Imbalanced Data

Learning from data with severe class imbalance is difficult. Established solutions include: under-sampling, adjusting classification threshold, and using an ensemble. We examine the performance of combining these solutions to balance the sensitivity and specificity for binary classifications, and to reduce the MSE score for probability estimation.

[1]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[3]  Gerald W. Kimble,et al.  Information and Computer Science , 1975 .

[4]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[5]  Robert C. Holte,et al.  Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria , 2000, ICML.

[6]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[7]  W. Michalowski,et al.  Dealing with Severely Imbalanced Data , 2009 .

[8]  Dragos D. Margineantu,et al.  Class Probability Estimation and Cost-Sensitive Classification Decisions , 2002, ECML.

[9]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[10]  Joost N. Kok Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, ECML.

[11]  Szymon Wilk,et al.  Integrating Selective Pre-processing of Imbalanced Data with Ivotes Ensemble , 2010, RSCTC.

[12]  Tapio Elomaa,et al.  Machine Learning: ECML 2002 , 2002, Lecture Notes in Computer Science.

[13]  David A. Cieslak,et al.  Automatically countering imbalance and its empirical relationship to cost , 2008, Data Mining and Knowledge Discovery.

[14]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[15]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[16]  Peter A. Flach,et al.  A Simple Lexicographic Ranker and Probability Estimator , 2007, ECML.

[17]  Ian Witten,et al.  Data Mining , 2000 .

[18]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[19]  Robert C. Holte,et al.  Severe Class Imbalance: Why Better Algorithms Aren't the Answer , 2005, ECML.

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[21]  Sadaaki Miyamoto,et al.  Rough Sets and Current Trends in Computing , 2012, Lecture Notes in Computer Science.

[22]  Stan Matwin,et al.  Cost-Based Sampling of Individual Instances , 2009, Canadian Conference on AI.