Improving Accuracy in word class tagging through the Combination of Machine Learning Systems

We examine how differences in language models, learned by different data-driven systems performing the same NLP task, can be exploited to yield a higher accuracy than the best individual system. We do this by means of experiments involving the task of morphosyntactic word class tagging, on the basis of three different tagged corpora. Four well-known tagger generators (hidden Markov model, memory-based, transformation rules, and maximum entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second-stage classifiers. All combination taggers outperform their best component. The reduction in error rate varies with the material in question, but can be as high as 24.3 with the LOB corpus.

[1]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[2]  H. van Halteren,et al.  Comparison of tagging strategies, a prelude to democratic tagging , 1996 .

[3]  W.J.M. Haeseryn Algemene Nederlandse spraakkunst , 1997 .

[4]  Eric Brill,et al.  Exploiting Diversity in Natural Language Processing: Combining Parsers , 1999, EMNLP.

[5]  Hans van Halteren,et al.  Improving Data Driven Wordclass Tagging by System Combination , 1998, ACL.

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Hopkins UniversityBaltimore Exploiting Diversity in Natural Language Processing: Combining Parsers , 1999 .

[8]  Geoffrey Leech,et al.  The tagged LOB Corpus : user's manual , 1986 .

[9]  Srinivas Bangalore,et al.  New Models for Improving Supertag Disambiguation , 1999, EACL.

[10]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[11]  Hans van Halteren Chunking with WPDV Models , 2000, CoNLL/LLL.

[12]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[13]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[14]  C. Kaynak,et al.  Techniques for Combining Multiple Learners , 1998 .

[15]  Hans van Halteren,et al.  Syntactic Wordclass Tagging , 1999 .

[16]  Christer Samuelsson,et al.  Handling Sparse Data by Successive Abstraction , 1996, COLING.

[17]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[18]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[19]  Luc Dehaspe Maximum Entropy Modeling with Clausal Constraints , 1997, ILP.

[20]  Walter Daelemans,et al.  Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers , 2000, LREC.

[21]  Eneko Agirre,et al.  Combining unsupervised lexical knowledge methods for word sense disambiguation , 1997 .

[22]  Yoram Singer,et al.  BoosTexter: A System for Multiclass Multi-label Text Categorization , 1998 .

[23]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[24]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[25]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[26]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[27]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[28]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[29]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[30]  Ian H. Witten,et al.  Stacking Bagged and Dagged Models , 1997, ICML.

[31]  Dan Tufis Tiered Tagging and Combined Language Models Classifiers , 1999, TSD.

[32]  Johanna D. Moore,et al.  36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, COLING-ACL '98, August 10-14, 1998, Université de Montréal, Montréal, Quebec, Canada. Proceedings of the Conference. , 1998 .

[33]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[34]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[35]  Ian H. Witten,et al.  Stacked generalization: when does it work? , 1997, IJCAI 1997.

[36]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[37]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[38]  Steven J. DeRose,et al.  Grammatical Category Disambiguation by Statistical Optimization , 1988, CL.

[39]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[40]  Yoram Singer,et al.  Boosting Applied to Tagging and PP Attachment , 1999, EMNLP.

[41]  Werkgroep Frequentie-onderzoek van het Nederlands,et al.  Woordfrequenties in geschreven en gesproken Nederlands , 1975 .

[42]  Josep Carmona,et al.  Improving POS Tagging Using Machine-Learning Techniques , 1999, EMNLP.

[43]  PP-Attachment: A Committee Machine Approach , 1999, EMNLP.

[44]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[45]  Hans van Halteren,et al.  A Default First Order Family Weight Determination Procedure for WPDV Models , 2000, CoNLL/LLL.

[46]  Thomas Lukasiewicz MAXIMUM ENTROPY , 2000 .

[47]  Eric Brill,et al.  Classifier Combination for Improved Lexical Disambiguation , 1998, ACL.

[48]  Kevin J. Cherkauer Human Expert-level Performance on a Scientiic Image Analysis Task by a System Using Combined Artiicial Neural Networks , 1996 .

[49]  Eneko Agirre,et al.  Towards a Single Proposal in Spelling Correction , 1998, COLING-ACL.

[50]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[51]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..