Active Learning Literature Survey

The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns. An active learner may pose queries, usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant or easily obtained, but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for successful active learning, a summary of problem setting variants and practical issues, and a discussion of related topics in machine learning research are also presented.

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[5]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[7]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[8]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[9]  C. Bonwell,et al.  Active learning : creating excitement in the classroom , 1991 .

[10]  N. Cressie,et al.  Statistics for Spatial Data. , 1992 .

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[13]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[14]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[15]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[16]  Virginia R. de Sa,et al.  Learning Classification with Unlabeled Data , 1993, NIPS.

[17]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[18]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[19]  Gerhard Paass,et al.  Bayesian Query Construction for Neural Network Models , 1994, NIPS.

[20]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[21]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[22]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[23]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[24]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[25]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[26]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[27]  F. Y. Edgeworth,et al.  The theory of statistics , 1996 .

[28]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[29]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[30]  Joachim M. Buhmann,et al.  Active Data Clustering , 1997, NIPS.

[31]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[32]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[33]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[34]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[35]  Kentaro Inui,et al.  Selective Sampling for Example-based Word Sense Disambiguation , 1998, CL.

[36]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[37]  Kamal Nigamyknigam,et al.  Employing Em in Pool-based Active Learning for Text Classiication , 1998 .

[38]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[39]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[40]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[41]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[42]  Craig A. Knoblock,et al.  Selective Sampling with Redundant Views , 2000, AAAI/IAAI.

[43]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[44]  Ronald Rosenfeld,et al.  A survey of smoothing techniques for ME models , 2000, IEEE Trans. Speech Audio Process..

[45]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[46]  Tong Zhang,et al.  Active learning using adaptive resampling , 2000, KDD '00.

[47]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[48]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[49]  Daphne Koller,et al.  Active learning: theory and applications , 2001 .

[50]  Andrew Tridgell,et al.  Reinforcement learning and chess , 2001 .

[51]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[52]  Rebecca Hwa,et al.  On minimizing training corpus for parser acquisition , 2001, CoNLL.

[53]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[54]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[55]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[56]  Tsuhan Chen,et al.  An active learning framework for content-based information retrieval , 2002, IEEE Trans. Multim..

[57]  Vikram Krishnamurthy,et al.  Algorithms for optimal scheduling and management of hidden Markov model sensors , 2002, IEEE Trans. Signal Process..

[58]  Zhiqiang Zheng,et al.  On active learning for data acquisition , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[59]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[60]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[61]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[62]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[63]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[64]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[65]  Joshua Goodman,et al.  Exponential Priors for Maximum Entropy Models , 2004, NAACL.

[66]  Jason Baldridge,et al.  Active Learning and the Total Cost of Annotation , 2004, EMNLP.

[67]  Zhi-Hua Zhou,et al.  Exploiting Unlabeled Data in Content-Based Image Retrieval , 2004, ECML.

[68]  Qiang Yang,et al.  Decision trees with minimal costs , 2004, ICML.

[69]  Claudio Gentile,et al.  Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms , 2004, NIPS.

[70]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[71]  Rebecca Hwa,et al.  Sample Selection for Statistical Parsing , 2004, CL.

[72]  Michael Lindenbaum,et al.  Selective Sampling for Nearest Neighbor Classifiers , 1999, Machine Learning.

[73]  Dana Angluin Queries revisited , 2004, Theor. Comput. Sci..

[74]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[75]  Raymond J. Mooney,et al.  Diverse ensembles for active learning , 2004, ICML.

[76]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[77]  Christopher H. Bryant,et al.  Functional genomic hypothesis generation and experimentation by a robot scientist , 2004, Nature.

[78]  Ying Liu,et al.  Active Learning with Support Vector Machine Applied to Gene Expression Data for Cancer Classification , 2004, J. Chem. Inf. Model..

[79]  C. A. Murthy,et al.  A probabilistic active support vector learning algorithm , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[81]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[82]  Foster J. Provost,et al.  Active feature-value acquisition for classifier induction , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[83]  Naftali Tishby,et al.  Query by Committee Made Real , 2005, NIPS.

[84]  Kun Deng,et al.  Balancing exploration and exploitation: a new algorithm for active machine learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[85]  Hwanjo Yu,et al.  SVM selective sampling for ranking with application to data retrieval , 2005, KDD '05.

[86]  Nozha Boujemaa,et al.  Active semi-supervised fuzzy clustering for image database categorization , 2005, MIR '05.

[87]  Andreas Krause,et al.  Near-optimal sensor placements in Gaussian processes , 2005, ICML.

[88]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[89]  Dragos D. Margineantu,et al.  Active Cost-Sensitive Learning , 2005, IJCAI.

[90]  Mark Craven,et al.  Supervised versus multiple instance learning: an empirical comparison , 2005, ICML.

[91]  Gökhan Tür,et al.  Combining active and semi-supervised learning for spoken language understanding , 2005, Speech Commun..

[92]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[93]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[94]  Michael I. Jordan,et al.  Robust design of biological experiments , 2005, NIPS.

[95]  Raymond J. Mooney,et al.  Active Learning for Probability Estimation Using Jensen-Shannon Divergence , 2005, ECML.

[96]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[97]  David Haussler,et al.  Learning Conjunctive Concepts in Structural Domains , 1989, Machine Learning.

[98]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[99]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[100]  R. Jones,et al.  Active Learning with Feedback on Both Features and Instances , 2006 .

[101]  Raymond J. Mooney,et al.  Using Active Relocation to Aid Reinforcement Learning , 2006, FLAIRS.

[102]  Rong Jin,et al.  Large-scale text categorization by batch mode active learning , 2006, WWW '06.

[103]  Claire Monteleoni,et al.  Learning with online constraints: shifting concepts and active learning , 2006 .

[104]  Sally A. Goldman,et al.  MISSL: multiple-instance semi-supervised learning , 2006, ICML.

[105]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[106]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[107]  Hema Raghavan,et al.  Active Learning with Feedback on Features and Instances , 2006, J. Mach. Learn. Res..

[108]  Rong Yan,et al.  Extreme video retrieval: joint maximization of human and computer performance , 2006, MM '06.

[109]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[110]  Victor S. Sheng,et al.  Feature value acquisition in testing: a sequential batch test algorithm , 2006, ICML.

[111]  Stefan Wrobel,et al.  Multi-class Ensemble-Based Active Learning , 2006, ECML.

[112]  Tom M. Mitchell,et al.  Text clustering with extended user feedback , 2006, SIGIR.

[113]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[114]  Eric Horvitz,et al.  Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning , 2007, IJCAI.

[115]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[116]  Carla E. Brodley,et al.  Active Class Selection , 2007, ECML.

[117]  Lawrence Carin,et al.  Cost-sensitive feature acquisition and classification , 2007, Pattern Recognit..

[118]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[119]  Lyle H. Ungar,et al.  Machine Learning manuscript No. (will be inserted by the editor) Active Learning for Logistic Regression: , 2007 .

[120]  Ross D. King,et al.  Active Learning for Regression Based on Query by Committee , 2007, IDEAL.

[121]  Shaul Markovitch,et al.  Anytime Induction of Cost-sensitive Trees , 2007, NIPS.

[122]  Dale Schuurmans,et al.  Discriminative Batch Mode Active Learning , 2007, NIPS.

[123]  Yuval Elovici,et al.  Improving the Detection of Unknown Computer Worms Activity Using Active Learning , 2007, KI.

[124]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[125]  Russell Greiner,et al.  Optimistic Active-Learning Using Mutual Information , 2007, IJCAI.

[126]  Udo Hahn,et al.  An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data , 2007, EMNLP.

[127]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[128]  Yi Zhang,et al.  Incorporating Diversity and Density in Active Learning for Relevance Feedback , 2007, ECIR.

[129]  Mark Craven,et al.  Active Learning with Real Annotation Costs , 2008 .

[130]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[131]  Dan Klein,et al.  Structure compilation: trading structure for features , 2008, ICML '08.

[132]  Eric K. Ringger,et al.  Assessing the Costs of Machine-Assisted Corpus Annotation through a User Study , 2008, LREC.

[133]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields , 2008, ACL.

[134]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[135]  Kristen Grauman,et al.  Multi-Level Active Prediction of Useful Image Annotations for Recognition , 2008, NIPS.

[136]  Xian-Sheng Hua,et al.  Two-Dimensional Active Learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[137]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[138]  Andreas Krause,et al.  Optimizing sensing: theory and applications , 2008 .

[139]  Udo Hahn,et al.  Multi-Task Active Learning for Linguistic Annotations , 2008, ACL.

[140]  Masashi Sugiyama,et al.  Active Learning with Model Selection in Linear Regression , 2008, SDM.

[141]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[142]  Fredrik Olsson,et al.  Bootstrapping Named Entity Annotation by Means of Active Machine Learning: A Method for Creating Corpora , 2008 .

[143]  Andreas Vlachos,et al.  A stopping criterion for active learning , 2008, Computer Speech and Language.

[144]  Gideon S. Mann,et al.  Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[145]  Mark Craven,et al.  Curious machines: active learning with structured instances , 2008 .

[146]  Ken E. Whelan,et al.  The Automation of Science , 2009, Science.

[147]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[148]  Caroline Gasperin,et al.  Active Learning for Anaphora Resolution , 2009, HLT-NAACL 2009.

[149]  Jason Baldridge,et al.  How well does active learning actually work? Time-based evaluation of cost-reduction strategies for language documentation. , 2009, EMNLP.

[150]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[151]  Carolyn Penstein Rosé,et al.  Estimating Annotation Cost for Active Learning in a Multi-Annotator Environment , 2009, HLT-NAACL 2009.

[152]  Udo Hahn,et al.  Semi-Supervised Active Learning for Sequence Labeling , 2009, ACL.

[153]  Josh C. Bongard,et al.  Exploiting multiple classifier types with active learning , 2009, GECCO.

[154]  Fredrik Olsson,et al.  A Web Survey on the Use of Active Learning to Support Annotation of Text Data , 2009, HLT-NAACL 2009.

[155]  Dan Klein,et al.  Learning from measurements in exponential families , 2009, ICML '09.

[156]  Andrew McCallum,et al.  Active Learning by Labeling Features , 2009, EMNLP.

[157]  Kristen Grauman,et al.  What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations , 2009, CVPR.

[158]  Foster J. Provost,et al.  Active Feature-Value Acquisition , 2009, Manag. Sci..

[159]  Vikas Sindhwani,et al.  Uncertainty sampling and transductive experimental design for active dual supervision , 2009, ICML '09.

[160]  Fredrik Olsson,et al.  A literature survey of active machine learning in the context of natural language processing , 2009 .

[161]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[162]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[163]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[164]  Fredrik Olsson,et al.  An Intrinsic Stopping Criterion for Committee-Based Active Learning , 2009, CoNLL.

[165]  K. Vijay-Shanker,et al.  A Method for Stopping Active Learning Based on Stabilizing Predictions and the Need for User-Adjustable Stopping , 2009, CoNLL.

[166]  Jaime G. Carbonell,et al.  Efficiently learning the accuracy of labeling sources for selective sampling , 2009, KDD.

[167]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[168]  Maria-Florina Balcan,et al.  The true sample complexity of active learning , 2010, Machine Learning.