A Method for Stopping Active Learning Based on Stabilizing Predictions and the Need for User-Adjustable Stopping

A survey of existing methods for stopping active learning (AL) reveals the needs for methods that are: more widely applicable; more aggressive in saving annotations; and more stable across changing datasets. A new method for stopping AL based on stabilizing predictions is presented that addresses these needs. Furthermore, stopping methods are required to handle a broad range of different annotation/performance tradeoff valuations. Despite this, the existing body of work is dominated by conservative methods with little (if any) attention paid to providing users with control over the behavior of stopping methods. The proposed method is shown to fill a gap in the level of aggressiveness available for stopping AL and supports providing users with control over stopping behavior.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[3]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[4]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[7]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[8]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[9]  Hae-Chang Rim,et al.  Biomedical named entity recognition using two-phase model based on SVMs , 2004, J. Biomed. Informatics.

[10]  Gordon V. Cormack,et al.  TREC 2006 Spam Track Overview , 2006, TREC.

[11]  D. Sculley,et al.  Online Active Learning Methods for Fast Label-Efficient Spam Filtering , 2007, CEAS.

[12]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[13]  Jingbo Zhu,et al.  Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[14]  Jingbo Zhu,et al.  Learning a Stopping Criterion for Active Learning for Word Sense Disambiguation and Text Classification , 2008, IJCNLP.

[15]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[16]  Jingbo Zhu,et al.  Multi-Criteria-Based Strategy to Stop Active Learning for Data Annotation , 2008, COLING.

[17]  Andreas Vlachos,et al.  A stopping criterion for active learning , 2008, Computer Speech and Language.

[18]  Hinrich Schütze,et al.  Stopping Criteria for Active Learning of Named Entity Recognition , 2008, COLING.

[19]  Jason Baldridge,et al.  Active learning and logarithmic opinion pools for HPSG parse selection , 2008, Natural Language Engineering.

[20]  K. Vijay-Shanker,et al.  Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets , 2009, NAACL.