论文信息 - An AdaBoost Using a Weak-Learner Generating Several Weak-Hypotheses for Large Training Data of Natural Language Processing

An AdaBoost Using a Weak-Learner Generating Several Weak-Hypotheses for Large Training Data of Natural Language Processing

AdaBoost is a method to create a final hypothesis by repeatedly generating a weak hypothesis in each training iteration with a given weak learner. AdaBoost-based algorithms are successfully applied to several tasks such as Natural Language Processing (NLP), OCR, and so on. However, learning on the training data consisting of large number of samples and features requires long training time. We propose a fast AdaBoost-based algorithm for learning rules represented by combination of features. Our algorithm constructs a final hypothesis by learning several weak-hypotheses at each iteration. We assign a confidence-rated value to each weak-hypothesis while ensuring a reduction in the theoretical upper bound of the training error of AdaBoost. We evaluate our methods with English POS tagging and text chunking. The experimental results show that the training speed of our algorithm are about 25 times faster than an AdaBoost-based learner, and about 50 times faster than Support Vector Machines with polynomial kernel on the average while maintaining state-of-the-art accuracy.

Kazuo Asakawa | Tomoya Iwakura | Seishi Okamoto

[1] Jun'ichi Tsujii,et al. Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[2] Lluís Màrquez i Villodre,et al. Boosting Applied toe Word Sense Disambiguation , 2000, ECML.

[3] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[4] Daniel Marcu,et al. Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[5] Tong Zhang,et al. Text Chunking using Regularized Winnow , 2001, ACL.

[6] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[7] Geoff Holmes,et al. Multiclass Alternating Decision Trees , 2002, ECML.

[8] Geoff Holmes,et al. Optimizing the Induction of Alternating Decision Trees , 2001, PAKDD.

[9] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[10] Jun Suzuki,et al. Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data , 2008, ACL.

[11] Shinichi Morishita. Computing Optimal Hypotheses Efficiently for Boosting , 2002, Progress in Discovery Science.