Combining Multiple Classifiers to Improve Part of Speech Tagging : A Case Study for Brazilian Portuguese

Four taggers have been trained on a 100,000-word corpus of Brazilian Portuguese, namely Unigram (Treetagger), N-gram (Treetagger), transformationbased (TBL) and Maximum-Entropy tagging (MXPOST). The latter displayed the best accuracy (88.73%), which is still much lower than the state-of-the-art accuracy for English. The low accuracy is attributed to the reduced size of the training corpus. Twelve methods of combination were used, four of which led to an improvement over the MXPOST accuracy. The best result (89.42%) was obtained with a majority-wins voting strategy.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[4]  Hans van Halteren,et al.  Improving Data Driven Wordclass Tagging by System Combination , 1998, ACL.

[5]  Aline Villavicencio,et al.  Part-of-Speech Tagging for Portuguese Texts , 1995, SBIA.

[6]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[7]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[8]  Nuno C. Marques,et al.  A Neural Network Approach to Part-of-Speech Tagging * , 1996 .

[9]  Eckhard Bick Automatic parsing of Portuguese , 1996 .

[10]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[11]  Maria das Graças Volpe Nunes,et al.  Linguistic issues in the development of ReGra: A grammar checker for Brazilian Portuguese , 1998, Natural Language Engineering.

[12]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[13]  Eric Brill,et al.  Classifier Combination for Improved Lexical Disambiguation , 1998, ACL.

[14]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[15]  Amanda J. C. Sharkey,et al.  Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems , 1999 .

[16]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.