Query-by-committee improvement with diversity and density in batch active learning

Abstract Active learning has gained attention as a method to expedite the learning curve of classifiers. To this end, uncertainty sampling is a widely adopted strategy that selects instances closer to the decision boundary. However, uncertainty sampling alone may not be sufficient in batch active learning due to the redundancy of instances and its susceptibility to outliers. In this study, we utilize query-by-committee (QBC) for uncertainty and demonstrate that its performance can be improved by introducing diversity and density in instance utility. Test results show that uncertainty sampling by QBC can be significantly improved with diversity and density incorporated in instance selection. Furthermore, we investigate several distance measures for use in diversity and density and show that random forest dissimilarity can be an effective distance measure in batch active learning. The effects of the characteristics of the data on the results are also analyzed.

[1]  Yan Leng,et al.  Combining active learning and semi-supervised learning to construct SVM classifier , 2013, Knowl. Based Syst..

[2]  Edwin Lughofer,et al.  Hybrid active learning for reducing the annotation effort of operators in classification systems , 2012, Pattern Recognit..

[3]  George C. Runger,et al.  Active Batch Learning with Stochastic Query-by-Forest (SQBF) , 2011 .

[4]  William J. Emery,et al.  Active Learning Methods for Remote Sensing Image Classification , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Burr Settles,et al.  From Theories to Queries: Active Learning in Practice , 2011 .

[6]  Laurent Heutte,et al.  Influence of Hyperparameters on Random Forest Accuracy , 2009, MCS.

[7]  D. Angluin Queries and Concept Learning , 1988 .

[8]  Rong Jin,et al.  Large-scale text categorization by batch mode active learning , 2006, WWW '06.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Marcos André Gonçalves,et al.  Ranked batch-mode active learning , 2017, Inf. Sci..

[11]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[12]  Tianshun Yao,et al.  Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification , 2008, COLING.

[13]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[14]  Xin Li,et al.  Adaptive Active Learning for Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Edwin Lughofer,et al.  On improving performance of surface inspection systems by online active learning and flexible classifier updates , 2015, Machine Vision and Applications.

[16]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[17]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[18]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[19]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[20]  Olivier Chapelle,et al.  Active Learning for Parzen Window Classifier , 2005, AISTATS.

[21]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[22]  Rayid Ghani,et al.  Online Active Learning with Imbalanced Classes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[23]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[24]  Yi Yang,et al.  Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization , 2015, International Journal of Computer Vision.

[25]  Tao Xiang,et al.  Finding Rare Classes: Active Learning with Generative and Discriminative Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[26]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[27]  Wei Hu,et al.  Unsupervised Active Learning Based on Hierarchical Graph-Theoretic Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[29]  Lorenzo Bruzzone,et al.  A cluster-assumption based batch mode active learning technique , 2012, Pattern Recognit. Lett..

[30]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[31]  Paul N. Bennett,et al.  Dual Strategy Active Learning , 2007, ECML.

[32]  Chi-Yin Chow,et al.  Ambiguity-Based Multiclass Active Learning , 2016, IEEE Transactions on Fuzzy Systems.

[33]  Manali Sharma,et al.  Evidence-based uncertainty sampling for active learning , 2016, Data Mining and Knowledge Discovery.

[34]  Dunja Mladenic,et al.  The Role of Hubness in Clustering High-Dimensional Data , 2014, IEEE Trans. Knowl. Data Eng..

[35]  Isabelle Guyon,et al.  Results of the Active Learning Challenge , 2011, Active Learning and Experimental Design @ AISTATS.

[36]  Brian Mac Namee,et al.  Active learning for text classification with reusability , 2016, Expert Syst. Appl..

[37]  Hema Raghavan,et al.  Active Learning with Feedback on Features and Instances , 2006, J. Mach. Learn. Res..

[38]  Vincent Lemaire,et al.  Optimised probabilistic active learning (OPAL) , 2015, Machine Learning.

[39]  Edwin Lughofer,et al.  On-line active learning: A new paradigm to improve practical useability of data stream modeling methods , 2017, Inf. Sci..

[40]  Lisha Hu,et al.  A new and informative active learning approach for support vector machine , 2013, Inf. Sci..

[41]  Hinrich Schütze,et al.  Active Learning with Amazon Mechanical Turk , 2011, EMNLP.

[42]  Ping Tang,et al.  A Batch-Mode Active Learning Algorithm Using Region-Partitioning Diversity for SVM Classifier , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[43]  Jaime G. Carbonell,et al.  From Active to Proactive Learning Methods , 2010, Advances in Machine Learning I.

[44]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[45]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[46]  Sethuraman Panchanathan,et al.  Generalized batch mode active learning for face-based biometric recognition , 2013, Pattern Recognit..

[47]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[48]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[49]  Francesco Ricci,et al.  A survey of active learning in collaborative filtering recommender systems , 2016, Comput. Sci. Rev..

[50]  Joshua B. Tenenbaum,et al.  Mapping a Manifold of Perceptual Observations , 1997, NIPS.