Uncertainty-based active learning with instability estimation for text classification

This article deals with pool-based active learning with uncertainty sampling. While existing uncertainty sampling methods emphasize selection of instances near the decision boundary to increase the likelihood of selecting informative examples, our position is that this heuristic is a surrogate for selecting examples for which the current learning algorithm iteration is likely to misclassify. To more directly model this intuition, this article augments such uncertainty sampling methods and proposes a simple instability-based selective sampling approach to improving uncertainty-based active learning, in which the instability degree of each unlabeled example is estimated during the learning process. Experiments on seven evaluation datasets show that instability-based sampling methods can achieve significant improvements over the traditional uncertainty sampling method. In terms of the average percentage of actively selected examples required for the learner to achieve 99% of its performance when training on the entire dataset, instability sampling and sampling by instability and density methods achieve better effectiveness in annotation cost reduction than random sampling and traditional entropy-based uncertainty sampling. Our experimental results have also shown that instability-based methods yield no significant improvement for active learning with SVMs when a popular sigmoidal function is used to transform SVM outputs to posterior probabilities.

[1]  K. Vijay-Shanker,et al.  A Method for Stopping Active Learning Based on Stabilizing Predictions and the Need for User-Adjustable Stopping , 2009, CoNLL.

[2]  Rosie Jones,et al.  Learning to Extract Entities from Labeled and Unlabeled Text , 2005 .

[3]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[4]  Jingbo Zhu,et al.  Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[5]  Eduard Hovy,et al.  Identification, classification, and analysis of opinions on the web , 2006 .

[6]  Lyle H. Ungar,et al.  Machine Learning manuscript No. (will be inserted by the editor) Active Learning for Logistic Regression: , 2007 .

[7]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[8]  Dong Yu,et al.  Active Learning and Semi-supervised Learning for Speech Recognition: a Unified Framework Using the Global Entropy Reduction Maximization Criterion Computer Speech and Language Article in Press Active Learning and Semi-supervised Learning for Speech Recognition: a Unified Framework Using the Global E , 2022 .

[9]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[10]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[11]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[12]  Eric K. Ringger,et al.  Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation , 2007, LAW@ACL.

[13]  Claire Cardie,et al.  Recognizing and Organizing Opinions Expressed in the World Press , 2003, New Directions in Question Answering.

[14]  David Yarowsky,et al.  Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking , 2000, ACL.

[15]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[16]  Hwee Tou Ng,et al.  An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation , 2002, EMNLP.

[17]  Dan Roth,et al.  Margin-Based Active Learning for Structured Output Spaces , 2006, ECML.

[18]  Min Tang,et al.  Active Learning for Statistical Natural Language Parsing , 2002, ACL.

[19]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[20]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[21]  Jason Baldridge,et al.  How well does active learning actually work? Time-based evaluation of cost-reduction strategies for language documentation. , 2009, EMNLP.

[22]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[23]  Kamal Nigamyknigam,et al.  Employing Em in Pool-based Active Learning for Text Classiication , 1998 .

[24]  Robert Sabourin,et al.  “One Against One” or “One Against All”: Which One is Better for Handwriting Recognition with SVMs? , 2006 .

[25]  Miles Osborne,et al.  A Two-Stage Method for Active Learning of Statistical Grammars , 2005, IJCAI.

[26]  Jingbo Zhu,et al.  Multi-Criteria-Based Strategy to Stop Active Learning for Data Annotation , 2008, COLING.

[27]  Jian Su,et al.  Multi-Criteria-based Active Learning for Named Entity Recognition , 2004, ACL.

[28]  Andreas Vlachos,et al.  A stopping criterion for active learning , 2008, Computer Speech and Language.

[29]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[30]  Jingbo Zhu,et al.  Confidence-based stopping criteria for active learning for data annotation , 2010, TSLP.

[31]  Udo Hahn,et al.  An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data , 2007, EMNLP.

[32]  ZhuJingbo,et al.  Uncertainty-based active learning with instability estimation for text classification , 2012 .

[33]  Janyce Wiebe,et al.  Word-Sense Disambiguation Using Decomposable Models , 1994, ACL.

[34]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[35]  Jingbo Zhu,et al.  A Density-Based Re-ranking Technique for Active Learning for Data Annotations , 2009, ICCPOL.

[36]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[37]  Ellen M. Voorhees,et al.  Corpus-Based Statistical Sense Resolution , 1993, HLT.

[38]  Martha Palmer,et al.  An Empirical Study of the Behavior of Active Learning for Word Sense Disambiguation , 2006, NAACL.

[39]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[40]  Eric K. Ringger,et al.  Assessing the Costs of Sampling Methods in Active Learning for Annotation , 2008, ACL.

[41]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[42]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[43]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[44]  Rebecca Hwa,et al.  Sample Selection for Statistical Grammar Induction , 2000, EMNLP.

[45]  Gholamreza Haffari,et al.  Active Learning for Multilingual Statistical Machine Translation , 2009, ACL.

[46]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[47]  Hwee Tou Ng,et al.  Domain Adaptation with Active Learning for Word Sense Disambiguation , 2007, ACL.

[48]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[49]  Tianshun Yao,et al.  Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification , 2008, COLING.