Active Learning

The key idea behind active learning is that a machine learning algorithm can perform better with less training if it is allowed to choose the data from which it learns. An active learner may pose "queries," usually in the form of unlabeled data instances to be labeled by an "oracle" (e.g., a human annotator) that already understands the nature of the problem. This sort of approach is well-motivated in many modern machine learning and data mining applications, where unlabeled data may be abundant or easy to come by, but training labels are difficult, time-consuming, or expensive to obtain. This book is a general introduction to active learning. It outlines several scenarios in which queries might be formulated, and details many query selection algorithms which have been organized into four broad categories, or "query selection frameworks." We also touch on some of the theoretical foundations of active learning, and conclude with an overview of the strengths and weaknesses of these approaches in practice, including a summary of ongoing work to address these open challenges and opportunities.

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[3]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[4]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[5]  R. Schwartz,et al.  The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[7]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[8]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[9]  Gerhard Paass,et al.  Bayesian Query Construction for Neural Network Models , 1994, NIPS.

[10]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[11]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[12]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[13]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[14]  Prasad Tadepalli,et al.  Active Learning with Committees for Text Categorization , 1997, AAAI/IAAI.

[15]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[16]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[17]  Craig A. Knoblock,et al.  Selective Sampling with Redundant Views , 2000, AAAI/IAAI.

[18]  David Yarowsky,et al.  Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking , 2000, ACL.

[19]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[20]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[21]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[22]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[23]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[24]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[25]  Tsuhan Chen,et al.  An active learning framework for content-based information retrieval , 2002, IEEE Trans. Multim..

[26]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[27]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[28]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[29]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[30]  Zhi-Hua Zhou,et al.  Exploiting Unlabeled Data in Content-Based Image Retrieval , 2004, ECML.

[31]  P. Kantor Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.

[32]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[33]  Raymond J. Mooney,et al.  Diverse ensembles for active learning , 2004, ICML.

[34]  Ying Liu,et al.  Active Learning with Support Vector Machine Applied to Gene Expression Data for Cancer Classification , 2004, J. Chem. Inf. Model..

[35]  Hwanjo Yu,et al.  SVM selective sampling for ranking with application to data retrieval , 2005, KDD '05.

[36]  Mark Craven,et al.  Supervised versus multiple instance learning: an empirical comparison , 2005, ICML.

[37]  Gökhan Tür,et al.  Combining active and semi-supervised learning for spoken language understanding , 2005, Speech Commun..

[38]  Raymond J. Mooney,et al.  Active Learning for Probability Estimation Using Jensen-Shannon Divergence , 2005, ECML.

[39]  R. Jones,et al.  Active Learning with Feedback on Both Features and Instances , 2006 .

[40]  Sally A. Goldman,et al.  MISSL: multiple-instance semi-supervised learning , 2006, ICML.

[41]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[42]  Hinrich Schütze,et al.  Performance thresholding in practical text classification , 2006, CIKM '06.

[43]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[44]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[45]  Gunnar Rätsch,et al.  Boosting Algorithms for Maximizing the Soft Margin , 2007, NIPS.

[46]  Lyle H. Ungar,et al.  Machine Learning manuscript No. (will be inserted by the editor) Active Learning for Logistic Regression: , 2007 .

[47]  Yuval Elovici,et al.  Improving the Detection of Unknown Computer Worms Activity Using Active Learning , 2007, KI.

[48]  Udo Hahn,et al.  An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data , 2007, EMNLP.

[49]  Yi Zhang,et al.  Incorporating Diversity and Density in Active Learning for Relevance Feedback , 2007, ECIR.

[50]  Gideon S. Mann,et al.  Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields , 2007, NAACL.

[51]  Mark Craven,et al.  Active Learning with Real Annotation Costs , 2008 .

[52]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[53]  Dan Roth,et al.  Active Learning for Pipeline Models , 2008, AAAI.

[54]  Eric K. Ringger,et al.  Assessing the Costs of Machine-Assisted Corpus Annotation through a User Study , 2008, LREC.

[55]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields , 2008, ACL.

[56]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[57]  Kristen Grauman,et al.  Multi-Level Active Prediction of Useful Image Annotations for Recognition , 2008, NIPS.

[58]  Xian-Sheng Hua,et al.  Two-Dimensional Active Learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Udo Hahn,et al.  Multi-Task Active Learning for Linguistic Annotations , 2008, ACL.

[60]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[61]  Andreas Vlachos,et al.  A stopping criterion for active learning , 2008, Computer Speech and Language.

[62]  Udo Hahn,et al.  Semi-Supervised Active Learning for Sequence Labeling , 2009, ACL.

[63]  Josh C. Bongard,et al.  Exploiting multiple classifier types with active learning , 2009, GECCO.

[64]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[65]  Fredrik Olsson,et al.  A Web Survey on the Use of Active Learning to Support Annotation of Text Data , 2009, HLT-NAACL 2009.

[66]  Dan Klein,et al.  Learning from measurements in exponential families , 2009, ICML '09.

[67]  Kristen Grauman,et al.  What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations , 2009, CVPR.

[68]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[69]  Vikas Sindhwani,et al.  Uncertainty sampling and transductive experimental design for active dual supervision , 2009, ICML '09.

[70]  Liwei Wang,et al.  Sufficient Conditions for Agnostic Active Learnable , 2009, NIPS.

[71]  Udo Hahn,et al.  On Proper Unit Selection in Active Learning: Co-Selection Effects for Named Entity Recognition , 2009, HLT-NAACL 2009.

[72]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[73]  Fredrik Olsson,et al.  An Intrinsic Stopping Criterion for Committee-Based Active Learning , 2009, CoNLL.

[74]  Carla E. Brodley,et al.  Modeling annotation time to reduce workload in comparative effectiveness reviews , 2010, IHI.

[75]  Yi Zhang,et al.  Multi-Task Active Learning with Output Constraints , 2010, AAAI.

[76]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[77]  Udo Hahn,et al.  A Comparison of Models for Cost-Sensitive Active Learning , 2010, COLING.

[78]  Carla E. Brodley,et al.  Active learning for biomedical citation screening , 2010, KDD.

[79]  Carla E. Brodley,et al.  The Constrained Weight Space SVM: Learning with Ranked Features , 2011, ICML.

[80]  Katharina Morik,et al.  Inspecting Sample Reusability for Active Learning , 2011, Active Learning and Experimental Design @ AISTATS.

[81]  Burr Settles,et al.  Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances , 2011, EMNLP.

[82]  Mark W. Craven,et al.  Sirt3 Substrate Specificity Determined by Peptide Arrays and Machine Learning , 2022 .

[83]  Carla E. Brodley,et al.  Who Should Label What? Instance Allocation in Multiple Expert Active Learning , 2011, SDM.

[84]  Jude W. Shavlik,et al.  Advice Refinement in Knowledge-Based SVMs , 2011, NIPS.