Adaptive scheduling for adaptive sampling in pos taggers construction

Abstract We introduce an adaptive scheduling for adaptive sampling as a novel way of machine learning in the construction of part-of-speech taggers. The goal is to speed up the training on large data sets, without significant loss of performance with regard to an optimal configuration. In contrast to previous methods using a random, fixed or regularly rising spacing between the instances, ours analyzes the shape of the learning curve geometrically in conjunction with a functional model to increase or decrease it at any time. The algorithm proves to be formally correct regarding our working hypotheses. Namely, given a case, the following one is the nearest ensuring a net gain of learning ability from the former, it being possible to modulate the level of requirement for this condition. We also improve the robustness of sampling by paying greater attention to those regions of the training data base subject to a temporary inflation in performance, thus preventing the learning from stopping prematurely. The proposal has been evaluated on the basis of its reliability to identify the convergence of models, corroborating our expectations. While a concrete halting condition is used for testing, users can choose any condition whatsoever to suit their own specific needs.

[1]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[2]  Bo Thiesson,et al.  The Learning-Curve Sampling Method Applied to Model-Based Clustering , 2002, J. Mach. Learn. Res..

[3]  Udo Hahn,et al.  Approximating Learning Curves for Active-Learning-Driven Annotation , 2008, LREC.

[4]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[5]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[6]  James F. Lynch Analysis and application of adaptive sampling , 2003, J. Comput. Syst. Sci..

[7]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[8]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[9]  Christian Biemann,et al.  Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering , 2006, ACL.

[10]  Osamu Watanabe,et al.  Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms , 1999, Data Mining and Knowledge Discovery.

[11]  Yusuke Miyao,et al.  Learning with Lookahead: Can History-Based Models Rival Globally Optimized Models? , 2011, CoNLL.

[12]  Hans van Halteren Performance of Taggers , 1999 .

[13]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[14]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[15]  Seong-Bae Park,et al.  A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors , 2012, ACL.

[16]  Hinrich Schütze,et al.  Performance thresholding in practical text classification , 2006, CIKM '06.

[17]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[18]  Osamu Watanabe Sequential sampling techniques for algorithmic learning theory , 2005, Theor. Comput. Sci..

[19]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[20]  Heiko Wersing,et al.  Incremental on-line learning: A review and comparison of state of the art algorithms , 2018, Neurocomputing.

[21]  Jeffrey F. Naughton,et al.  Efficient Sampling Strategies for Relational Database Operations , 1993, Theor. Comput. Sci..

[22]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[23]  Tim Oates,et al.  Efficient progressive sampling , 1999, KDD '99.

[24]  Andreas Vlachos,et al.  A stopping criterion for active learning , 2008, Computer Speech and Language.

[25]  Foster J. Provost,et al.  Inactive learning?: difficulties employing active learning in practice , 2011, SKDD.

[26]  F. Chervenak,et al.  Authors' reply re: BJOG Debate ‘Home birth is unsafe’ , 2016, BJOG : an international journal of obstetrics and gynaecology.

[27]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[28]  Guy W. Mineau,et al.  Distributed Data Mining vs. Sampling Techniques: A Comparison , 2004, Canadian Conference on AI.

[29]  Cheng-Hao Tsai,et al.  Incremental and decremental training for linear classification , 2014, KDD.

[30]  Pat Langley,et al.  Static Versus Dynamic Sampling for Data Mining , 1996, KDD.

[31]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[32]  Douglas H. Fisher,et al.  Modeling decision tree performance with the power law , 1999, AISTATS.

[33]  Grace Ngai,et al.  Transformation Based Learning in the Fast Lane , 2001, NAACL.

[34]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.

[35]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[36]  Manuel Vilares Ferro,et al.  Modeling of learning curves with applications to POS tagging , 2024, Comput. Speech Lang..

[37]  Ari Rappoport,et al.  Type Level Clustering Evaluation: New Measures and a POS Induction Case Study , 2010, CoNLL.

[38]  Josef van Genabith,et al.  Learning Morphology with Morfette , 2008, LREC.

[39]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[40]  Helmut Schmid,et al.  Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging , 2008, COLING.

[41]  Ye Tian,et al.  Maximizing classifier utility when there are data acquisition and modeling costs , 2008, Data Mining and Knowledge Discovery.

[42]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[43]  Russell Greiner,et al.  Learning and Classifying Under Hard Budgets , 2005, ECML.

[44]  Joaquin Vanschoren,et al.  Selecting Classification Algorithms with Active Testing , 2012, MLDM.

[45]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[46]  K. Vijay-Shanker,et al.  A Method for Stopping Active Learning Based on Stabilizing Predictions and the Need for User-Adjustable Stopping , 2009, CoNLL.

[47]  Johannes Fürnkranz,et al.  Integrative Windowing , 1998, J. Artif. Intell. Res..

[48]  Jianhua Chen Properties of a New Adaptive Sampling Method with Applications to Scalable Learning , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[49]  Thomas F. Coleman,et al.  A Subspace, Interior, and Conjugate Gradient Method for Large-Scale Bound-Constrained Minimization Problems , 1999, SIAM J. Sci. Comput..