Margin-based active learning for structured predictions

Margin-based active learning remains the most widely used active learning paradigm due to its simplicity and empirical successes. However, most works are limited to binary or multiclass prediction problems, thus restricting the applicability of these approaches to many complex prediction problems where active learning would be most useful. For example, machine learning techniques for natural language processing applications often require combining multiple interdependent prediction problems—generally referred to as learning in structured output spaces. In many such application domains, complexity is further managed by decomposing a complex prediction into a sequence of predictions where earlier predictions are used as input to later predictions—commonly referred to as a pipeline model. This work describes methods for extending existing margin-based active learning techniques to these two settings, thus increasing the scope of problems for which active learning can be applied. We empirically validate these proposed active learning techniques by reducing the annotated data requirements on multiple instances of synthetic data, a semantic role labeling task, and a named entity and relation extraction system.

[1]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[2]  Lawrence O. Hall,et al.  Active learning to recognize multiple types of plankton , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[3]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[4]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[5]  Chris Brew,et al.  Stone soup translation: the linked automata model , 2002 .

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  Satoshi Sekine,et al.  Extended Named Entity Hierarchy , 2002, LREC.

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[9]  Alex Goodall,et al.  The guide to expert systems , 1985 .

[10]  Dan Roth,et al.  Semantic Role Labeling Via Integer Linear Programming Inference , 2004, COLING.

[11]  Razvan C. Bunescu Learning with Probabilistic Features for Improved Pipeline Models , 2008, EMNLP.

[12]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[13]  Abhay Harpale,et al.  Document Classification Through Interactive Supervision of Document and Term Labels , 2004, PKDD.

[14]  Dan Roth,et al.  Constraint Classification for Multiclass Classification and Ranking , 2002, NIPS.

[15]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[16]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[17]  Daumé,et al.  Domain Adaptation meets Active Learning , 2010, HLT-NAACL 2010.

[18]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[19]  Hinrich Schütze,et al.  Stopping Criteria for Active Learning of Named Entity Recognition , 2008, COLING.

[20]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[21]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[22]  Ming-Wei Chang,et al.  Learning and Inference with Constraints , 2008, AAAI.

[23]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[24]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[25]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[26]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[27]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[28]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[29]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[30]  Dan Roth,et al.  Learning and Inference over Constrained Output , 2005, IJCAI.

[31]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[32]  Udo Hahn,et al.  Semi-Supervised Active Learning for Sequence Labeling , 2009, ACL.

[33]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[34]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[35]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[36]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[37]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[38]  Dan Roth,et al.  Sequential Learning of Classifiers for Structured Prediction Problems , 2009, AISTATS.

[39]  Dan Roth,et al.  Margin-Based Active Learning for Structured Output Spaces , 2006, ECML.

[40]  Klaus Brinker,et al.  Active learning of label ranking functions , 2004, ICML.

[41]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[42]  Markus Becker,et al.  Active learning : an explicit treatment of unreliable parameters , 2008 .

[43]  Steve Hanneke,et al.  Teaching Dimension and the Complexity of Active Learning , 2007, COLT.

[44]  Steven P. Abney,et al.  Bootstrapping , 2002, ACL.

[45]  Udo Hahn,et al.  An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data , 2007, EMNLP.

[46]  Brigham Anderson,et al.  Active learning for Hidden Markov Models: objective functions and algorithms , 2005, ICML.

[47]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[48]  Geoffrey E. Hinton,et al.  Unsupervised learning : foundations of neural computation , 1999 .

[49]  Fredrik Olsson,et al.  A literature survey of active machine learning in the context of natural language processing , 2009 .

[50]  Maria-Florina Balcan,et al.  The true sample complexity of active learning , 2010, Machine Learning.

[51]  D. Roth 1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation , 2007 .

[52]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[53]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[54]  Kevin M. Small Interactive Learning Protocols for Natural Language Applications , 2009 .

[55]  Jason Baldridge,et al.  Active Learning and the Total Cost of Annotation , 2004, EMNLP.

[56]  Min Tang,et al.  Active Learning for Statistical Natural Language Parsing , 2002, ACL.

[57]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[58]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[59]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[60]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[61]  Jian Su,et al.  Multi-Criteria-based Active Learning for Named Entity Recognition , 2004, ACL.

[62]  Stefan Wrobel,et al.  Active Learning of Partially Hidden Markov Models , 2001 .

[63]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[64]  Jingbo Zhu,et al.  Multi-Criteria-Based Strategy to Stop Active Learning for Data Annotation , 2008, COLING.

[65]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[66]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[67]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[68]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[69]  Xavier Carreras,et al.  Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling , 2004, CoNLL.

[70]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[71]  David G. Stork,et al.  Pattern Classification , 1973 .

[72]  Andreas Vlachos,et al.  A stopping criterion for active learning , 2008, Computer Speech and Language.

[73]  Rebecca Hwa,et al.  Sample Selection for Statistical Parsing , 2004, CL.

[74]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[75]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[76]  Jaime G. Carbonell,et al.  Optimizing estimated loss reduction for active sampling in rank learning , 2008, ICML '08.

[77]  Hwee Tou Ng,et al.  Domain Adaptation with Active Learning for Word Sense Disambiguation , 2007, ACL.

[78]  Ming-Wei Chang,et al.  Multilingual dependency parsing: A pipeline approach , 2007 .

[79]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[80]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[81]  Dan Roth,et al.  Active Learning for Pipeline Models , 2008, AAAI.

[82]  R. Schapire,et al.  Toward Efficient Agnostic Learning , 1994 .

[83]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[84]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[85]  Jingbo Zhu,et al.  Learning a Stopping Criterion for Active Learning for Word Sense Disambiguation and Text Classification , 2008, IJCNLP.

[86]  Paul N. Bennett,et al.  Dual Strategy Active Learning , 2007, ECML.