An AdaBoost Using a Weak-Learner Generating Several Weak-Hypotheses for Large Training Data of Natural Language Processing

AdaBoost is a method to create a final hypothesis by repeatedly generating a weak hypothesis in each training iteration with a given weak learner. AdaBoost-based algorithms are successfully applied to several tasks such as Natural Language Processing (NLP), OCR, and so on. However, learning on the training data consisting of large number of samples and features requires long training time. We propose a fast AdaBoost-based algorithm for learning rules represented by combination of features. Our algorithm constructs a final hypothesis by learning several weak-hypotheses at each iteration. We assign a confidence-rated value to each weak-hypothesis while ensuring a reduction in the theoretical upper bound of the training error of AdaBoost. We evaluate our methods with English POS tagging and text chunking. The experimental results show that the training speed of our algorithm are about 25 times faster than an AdaBoost-based learner, and about 50 times faster than Support Vector Machines with polynomial kernel on the average while maintaining state-of-the-art accuracy.

[1]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[2]  Lluís Màrquez i Villodre,et al.  Boosting Applied toe Word Sense Disambiguation , 2000, ECML.

[3]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[4]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[5]  Tong Zhang,et al.  Text Chunking using Regularized Winnow , 2001, ACL.

[6]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[7]  Geoff Holmes,et al.  Multiclass Alternating Decision Trees , 2002, ECML.

[8]  Geoff Holmes,et al.  Optimizing the Induction of Alternating Decision Trees , 2001, PAKDD.

[9]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[10]  Jun Suzuki,et al.  Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data , 2008, ACL.

[11]  Shinichi Morishita Computing Optimal Hypotheses Efficiently for Boosting , 2002, Progress in Discovery Science.

[12]  Lluís Màrquez i Villodre,et al.  Fast and accurate part-of-speech tagging: The SVM approach revisited , 2003, RANLP.

[13]  Tong Zhang,et al.  A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[14]  Kentaro Torisawa,et al.  A New Perceptron Algorithm for Sequence Labeling with Non-Local Features , 2007, EMNLP.

[15]  Taku Kudo,et al.  Boosting-based Parse Reranking with Subtree Features , 2005, ACL.

[16]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[17]  Giorgio Satta,et al.  Guided Learning for Bidirectional Sequence Classification , 2007, ACL.

[18]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[19]  Robert E. Schapire,et al.  Theoretical Views of Boosting and Applications , 1999, ALT.

[20]  Alessandro Sperduti,et al.  An improved boosting algorithm and its application to text categorization , 2000, CIKM '00.

[21]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[22]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.