Classification Active Learning Based on Mutual Information

Selecting a subset of samples to label from a large pool of unlabeled data points, such that a sufficiently accurate classifier is obtained using a reasonably small training set is a challenging, yet critical problem. Challenging, since solving this problem includes cumbersome combinatorial computations, and critical, due to the fact that labeling is an expensive and time-consuming task, hence we always aim to minimize the number of required labels. While information theoretical objectives, such as mutual information (MI) between the labels, have been successfully used in sequential querying, it is not straightforward to generalize these objectives to batch mode. This is because evaluation and optimization of functions which are trivial in individual querying settings become intractable for many objectives when we are to select multiple queries. In this paper, we develop a framework, where we propose efficient ways of evaluating and maximizing the MI between labels as an objective for batch mode active learning. Our proposed framework efficiently reduces the computational complexity from an order proportional to the batch size, when no approximation is applied, to the linear cost. The performance of this framework is evaluated using data sets from several fields showing that the proposed framework leads to efficient active learning for most of the data sets.

[1]  Andreas Krause,et al.  Near-optimal Batch Mode Active Learning and Adaptive Submodular Optimization , 2013, ICML.

[2]  Theodoros Damoulas,et al.  AL 2 : Learning for Active Learning , 2011 .

[3]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[4]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[5]  Yuhong Guo,et al.  Active Instance Sampling via Matrix Partition , 2010, NIPS.

[6]  Joseph Naor,et al.  Submodular Maximization with Cardinality Constraints , 2014, SODA.

[7]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[8]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[9]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[10]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[11]  Rishabh K. Iyer,et al.  Submodularity in Data Subset Selection and Active Learning , 2015, ICML.

[12]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[13]  Xin Li,et al.  Adaptive Active Learning for Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Dale Schuurmans,et al.  Discriminative Batch Mode Active Learning , 2007, NIPS.

[15]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[16]  Daniele P. Radicioni,et al.  BREVE: An HMPerceptron-Based Chord Recognition System , 2010, Advances in Music Information Retrieval.

[17]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Glencora Borradaile,et al.  Batch Active Learning via Coordinated Matching , 2012, ICML.

[20]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[21]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[22]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[23]  Laurence A. Wolsey,et al.  Best Algorithms for Approximating the Maximum of a Submodular Set Function , 1978, Math. Oper. Res..

[24]  U. Feige,et al.  Maximizing Non-monotone Submodular Functions , 2011 .

[25]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[26]  Russell Greiner,et al.  Optimistic Active-Learning Using Mutual Information , 2007, IJCAI.

[27]  Joseph Naor,et al.  A Tight Linear Time (1/2)-Approximation for Unconstrained Submodular Maximization , 2015, SIAM J. Comput..

[28]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[29]  Andrew McCallum,et al.  Toward Optimal Active Learning through Monte Carlo Estimation of Error Reduction , 2001, ICML 2001.