Class Imbalance and Active Learning

The performance of a predictive model is tightly coupled with the data used during training. In active learning (AL), the model itself plays a hands-on role in the selection of examples for labeling from a large pool of unlabeled examples. This chapter focuses on the interaction between AL and class imbalance, discussing (i) AL techniques designed specifically for dealing with imbalanced settings, (ii) strategies that leverage AL to overcome the deleterious effects of class imbalance, (iii) how extreme class imbalance can prevent AL systems from selecting useful examples, and alternatives to AL in these cases.

[1]  Jude W. Shavlik,et al.  Online Knowledge-Based Support Vector Machines , 2010, ECML/PKDD.

[2]  Tianshun Yao,et al.  Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification , 2008, COLING.

[3]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[4]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[5]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[6]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[7]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[8]  Yuval Elovici,et al.  Improving the Detection of Unknown Computer Worms Activity Using Active Learning , 2007, KI.

[9]  Rohini K. Srihari,et al.  Incorporating prior knowledge with weighted margin support vector machines , 2004, KDD.

[10]  Panagiotis G. Ipeirotis,et al.  Beat the Machine: Challenging Workers to Find the Unknown Unknowns , 2011, Human Computation.

[11]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[12]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.

[13]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[14]  Foster J. Provost,et al.  An expected utility approach to active feature-value acquisition , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[15]  Jingbo Zhu,et al.  Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[16]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[17]  Vikas Sindhwani,et al.  Uncertainty sampling and transductive experimental design for active dual supervision , 2009, ICML '09.

[18]  K. Vijay-Shanker,et al.  Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets , 2009, NAACL.

[19]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[20]  Philip S. Yu,et al.  Text Classification by Labeling Words , 2004, AAAI.

[21]  Vikas Sindhwani,et al.  Active Dual Supervision: Reducing the Cost of Annotating Examples and Features , 2009, HLT-NAACL 2009.

[22]  Paul N. Bennett,et al.  Dual Strategy Active Learning , 2007, ECML.

[23]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[24]  Andrew McCallum,et al.  Active Learning by Labeling Features , 2009, EMNLP.

[25]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[26]  Dan Klein,et al.  Learning from measurements in exponential families , 2009, ICML '09.

[27]  Robert E. Schapire,et al.  Incorporating Prior Knowledge into Boosting , 2002, ICML.

[28]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[29]  Foster Provost,et al.  The effect of class distribution on classifier learning , 2001 .

[30]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[31]  Jingrui He,et al.  Nearest-Neighbor-Based Active Learning for Rare Category Detection , 2007, NIPS.

[32]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[33]  C. Lee Giles,et al.  Efficient Name Disambiguation for Large-Scale Databases , 2006, PKDD.

[34]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[35]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[36]  Foster J. Provost,et al.  Inactive learning?: difficulties employing active learning in practice , 2011, SKDD.

[37]  Russell Greiner,et al.  Optimistic Active-Learning Using Mutual Information , 2007, IJCAI.

[38]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[39]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[40]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[41]  U. Hahn,et al.  Reducing class imbalance during active learning for named entity annotation , 2009, K-CAP '09.

[42]  Grzegorz Swirszcz,et al.  On cross-validation and stacking: building seemingly predictive models on random data , 2011, SKDD.

[43]  Foster J. Provost,et al.  Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance , 2010, KDD.

[44]  Xiaowei Xu,et al.  Representative Sampling for Text Classification Using Support Vector Machines , 2003, ECIR.

[45]  Gideon S. Mann,et al.  Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[46]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[47]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[48]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[49]  Jaime G. Carbonell,et al.  Paired Sampling in Density-Sensitive Active Learning , 2008, ISAIM.

[50]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[51]  Yoav Preund,et al.  Sifting informative examples from a random source. , 1994 .

[52]  Foster J. Provost,et al.  A Unified Approach to Active Dual Supervision for Labeling Features and Examples , 2010, ECML/PKDD.

[53]  David Madigan,et al.  Constructing informative prior distributions from domain knowledge in text classification , 2006, SIGIR.

[54]  Abhay Harpale,et al.  Document Classification Through Interactive Supervision of Document and Term Labels , 2004, PKDD.

[55]  H. Sebastian Seung,et al.  Information, Prediction, and Query by Committee , 1992, NIPS.

[56]  Gary M. Weiss The Impact of Small Disjuncts on Classifier Learning , 2010, Data Mining.

[57]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .

[58]  Seyda Ertekin LEARNING IN EXTREME CONDITIONS: ONLINE AND ACTIVE LEARNING WITH MASSIVE, IMBALANCED AND NOISY DATA , 2009 .

[59]  Carla E. Brodley,et al.  Active Class Selection , 2007, ECML.