Imbalanced Data Learning

An imbalanced training dataset can pose serious problems for many real-world data-mining tasks that conduct supervised learning. In this chapter,\(^\dagger\) we present a kernel-boundary-alignment algorithm, which considers training-data imbalance as prior information to augment SVMs to improve class-prediction accuracy. Using a simple example, we first show that SVMs can suffer from high incidences of false negatives when the training instances of the target class are heavily outnumbered by the training instances of a non-target class. The remedy we propose is to adjust the class boundary by modifying the kernel matrix, according to the imbalanced data distribution. Through theoretical analysis backed by empirical study, we show that the kernel-boundary-alignment algorithm works effectively on several datasets.

[1]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[2]  Alexander J. Smola,et al.  Hyperkernels , 2002, NIPS.

[3]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[4]  Robert C. Holte,et al.  Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria , 2000, ICML.

[5]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[6]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[7]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[8]  John Shawe-Taylor,et al.  Optimizing Classifers for Imbalanced Training Sets , 1998, NIPS.

[9]  Akira Iwata,et al.  A Solution for Imbalanced Training Sets Problem by CombNET-II and Its Application on Fog Forecasting , 2002 .

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[12]  Edward Y. Chang,et al.  Adaptive Feature-Space Conformal Transformation for Imbalanced-Data Learning , 2003, ICML.

[13]  Claire Cardie,et al.  Improving Minority Class Prediction Using Case-Specific Feature Weights , 1997, ICML.

[14]  Edward Y. Chang,et al.  Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance , 2003, MULTIMEDIA '03.

[15]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[16]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[17]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[18]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[19]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[20]  John Shawe-Taylor,et al.  Refining Kernels for Regression and Uneven Classification Problems , 2003, AISTATS.

[21]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[22]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[23]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[25]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[26]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[27]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[28]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[29]  Rohini K. Srihari,et al.  Incorporating prior knowledge with weighted margin support vector machines , 2004, KDD.

[30]  Rohini K. Srihari,et al.  New í-Support Vector Machines and their Sequential Minimal Optimization , 2003, ICML.

[31]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..