Optimised probabilistic active learning ( OPAL ) For fast , non-myopic , cost-sensitive active classification

In contrast to ever increasing volumes of automatically generated data, human annotation capacities remain limited. Thus, fast active learning approaches that allow the efficient allocation of annotation efforts gain in importance. Furthermore, cost-sensitive applications such as fraud detection pose the additional challenge of differing misclassification costs between classes. Unfortunately, the few existing cost-sensitive active learning approaches rely on time-consuming steps, such as performing self-labelling or tedious evaluations over samples. We propose a fast, non-myopic, and cost-sensitive probabilistic active learning approach for binary classification. Our approach computes the expected reduction in misclassification loss in a labelling candidate’s neighbourhood. We derive and use a closedform solution for this expectation, which considers the possible values of the true posterior of the positive class at the candidate’s position, its possible label realisations, and the given labelling budget. The resulting myopic algorithm runs in the same linear asymptotic time as uncertainty sampling, while its non-myopic counterpart requires an additional factor of O(m · logm) in the budget size. The experimental evaluation on several synthetic and realworld data sets shows competitive or better classification performance and runtime, compared to several uncertainty samplingand error-reduction-based active learning strategies, both in cost-sensitive and cost-insensitive settings.

[1]  Sugato Basu,et al.  Semi-Supervised Learning , 2019, Encyclopedia of Database Systems.

[2]  Myra Spiliopoulou,et al.  Probabilistic Active Learning: Towards Combining Versatility, Optimality and Efficiency , 2014, Discovery Science.

[3]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[4]  Myra Spiliopoulou,et al.  Probabilistic Active Learning: A Short Proposition , 2014, ECAI.

[5]  Joachim Denzler,et al.  Labeling Examples That Matter: Relevance-Based Active Learning with Gaussian Processes , 2013, GCPR.

[6]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[7]  David Cohn,et al.  Active learning , 2013, Veterinary Record.

[8]  Yue Zhao,et al.  A near-optimal non-myopic active learning method , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[9]  Vivekanand Gopalkrishnan,et al.  Big data, big business: bridging the gap , 2012, BigMine '12.

[10]  Roman Garnett,et al.  Bayesian Optimal Active Search and Surveying , 2012, ICML.

[11]  Charles Parker,et al.  An Analysis of Performance Measures for Binary Classifiers , 2011, 2011 IEEE 11th International Conference on Data Mining.

[12]  Jingbo Zhu,et al.  Active Learning With Sampling by Uncertainty and Density for Data Annotations , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Prateek Jain,et al.  Far-sighted active learning on a budget for image and video recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Goo Jun,et al.  Spatially Cost-Sensitive Active Learning , 2009, SDM.

[15]  Alexander Yun-chung Liu,et al.  Active learning in cost-sensitive environments , 2009 .

[16]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[17]  U. Hahn,et al.  Reducing class imbalance during active learning for named entity annotation , 2009, K-CAP '09.

[18]  Goo Jun,et al.  A self-training approach to cost sensitive uncertainty sampling , 2009, Machine Learning.

[19]  Dragos D. Margineantu,et al.  Active Cost-Sensitive Learning , 2005, IJCAI.

[20]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[21]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[22]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[23]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[24]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[25]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[26]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[27]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[28]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[29]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[30]  Lyle H. Ungar,et al.  Machine Learning manuscript No. (will be inserted by the editor) Active Learning for Logistic Regression: , 2007 .

[31]  Olivier Chapelle,et al.  Active Learning for Parzen Window Classifier , 2005, AISTATS.

[32]  L. Ungar,et al.  Active learning for logistic regression , 2005 .

[33]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[34]  William H. Press,et al.  Numerical Recipes in Fortran 77: The Art of Scientific Computing 2nd Editionn - Volume 1 of Fortran Numerical Recipes , 1992 .