Learning on the border: active learning in imbalanced data classification

This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various real-world classification tasks, such as medical diagnosis, text categorization and fraud detection suffer from this phenomenon. The standard machine learning algorithms yield better prediction performance with balanced datasets. In this paper, we demonstrate that active learning is capable of solving the class imbalance problem by providing the learner more balanced classes. We also propose an efficient way of selecting informative instances from a smaller pool of samples for active learning which does not necessitate a search through the entire dataset. The proposed method yields an efficient querying system and allows active learning to be applied to very large datasets. Our experimental results show that with an early stopping criteria, active learning achieves a fast solution with competitive prediction performance in imbalanced data classification.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[3]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[4]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[5]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[6]  Nils J. Nilsson,et al.  Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[7]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[8]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[9]  Ronald L. Rivest,et al.  On the sample complexity of pac-learning using random and chosen examples , 1990, Annual Conference Computational Learning Theory.

[10]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[11]  Isabelle Guyon,et al.  Automatic Capacity Tuning of Very Large VC-Dimension Classifiers , 1992, NIPS.

[12]  Isabelle Guyon,et al.  Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[13]  Michael J. Pazzani,et al.  Reducing Misclassification Costs , 1994, ICML.

[14]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[15]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[16]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[17]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[18]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Nello Cristianini,et al.  The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines , 1998, ICML.

[21]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[22]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[23]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[24]  Shun-ichi Amari,et al.  Statistical analysis of learning dynamics , 1999, Signal Process..

[25]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[26]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[27]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[28]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[29]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[30]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[31]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[32]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[33]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[34]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[35]  B. Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, ICML.

[36]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[37]  Claudio Gentile,et al.  A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..

[38]  Chih-Jen Lin,et al.  On the convergence of the decomposition method for support vector machines , 2001, IEEE Trans. Neural Networks.

[39]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[40]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[41]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[42]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[43]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[44]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[45]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[46]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[47]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[48]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[49]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[50]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[51]  Norikazu Takahashi,et al.  On Termination of SMO Algorithm for Support Vector Machines , 2003 .

[52]  Fernando Pérez-Cruz,et al.  Empirical risk minimization for support vector classifiers , 2003, IEEE Trans. Neural Networks.

[53]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds , 2003, NIPS.

[54]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[55]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[56]  W. Wong,et al.  On ψ-Learning , 2003 .

[57]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[58]  Alex Smola,et al.  Une boîte à outils rapide et simple pour les SVM , 2004 .

[59]  Nitesh V. Chawla,et al.  Classification and knowledge discovery in protein databases , 2004, J. Biomed. Informatics.

[60]  Jason Weston,et al.  Breaking SVM Complexity with Cross-Training , 2004, NIPS.

[61]  Yi Li,et al.  The Relaxed Online Maximum Margin Algorithm , 1999, Machine Learning.

[62]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[63]  Adam Kowalczyk,et al.  Extreme re-balancing for SVMs: a case study , 2004, SKDD.

[64]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[65]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[66]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[67]  Yoram Singer,et al.  Leveraging the margin more carefully , 2004, ICML.

[68]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[69]  Jerzy W. Grzymala-Busse,et al.  An Approach to Imbalanced Data Sets Based on Changing Rule Strength , 2004, Rough-Neural Computing: Techniques for Computing with Words.

[70]  S. Sathiya Keerthi,et al.  Convergence of a Generalized SMO Algorithm for SVM Classifier Design , 2002, Machine Learning.

[71]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[72]  Thomas Hofmann,et al.  Kernel Methods for Missing Variables , 2005, AISTATS.

[73]  Yufeng Liu,et al.  Multicategory ψ-Learning and Support Vector Machine: Computational Tools , 2005 .

[74]  Antoine Bordes,et al.  The Huller: A Simple and Efficient Online SVM , 2005, ECML.

[75]  Ivor W. Tsang,et al.  Very Large SVM Training using Core Vector Machines , 2005, AISTATS.

[76]  Jason Weston,et al.  Online (and Offline) on an Even Tighter Budget , 2005, AISTATS.

[77]  Léon Bottou,et al.  On-line learning for very large data sets , 2005 .

[78]  Hanif D. Sherali,et al.  Methods of Feasible Directions , 2005 .

[79]  Koby Crammer,et al.  Robust Support Vector Machine Training via Convex Outlier Ablation , 2006, AAAI.

[80]  C. Lee Giles,et al.  Efficient Name Disambiguation for Large-Scale Databases , 2006, PKDD.

[81]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[82]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[83]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[84]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[85]  Jie Li,et al.  Training robust support vector machine with smooth Ramp loss in the primal space , 2008, Neurocomputing.

[86]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .