A comparative study of classifier combination applied to NLP tasks

The paper is devoted to a comparative study of classifier combination methods, which have been successfully applied to multiple tasks including Natural Language Processing (NLP) tasks. There is variety of classifier combination techniques and the major difficulty is to choose one that is the best fit for a particular task. In our study we explored the performance of a number of combination methods such as voting, Bayesian merging, behavior knowledge space, bagging, stacking, feature sub-spacing and cascading, for the part-of-speech tagging task using nine corpora in five languages. The results show that some methods that, currently, are not very popular could demonstrate much better performance. In addition, we learned how the corpus size and quality influence the combination methods performance. We also provide the results of applying the classifier combination methods to the other NLP tasks, such as name entity recognition and chunking. We believe that our study is the most exhaustive comparison made with combination methods applied to NLP tasks so far.

[1]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[2]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[3]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[4]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  C. Lee Giles,et al.  Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization , 2008, CIKM '08.

[6]  Kazutaka Shimada,et al.  Movie Review Classification Based on a Multiple Classifier , 2007, PACLIC.

[7]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[8]  Daniel Zeman,et al.  Improving Parsing Accuracy by Combining Diverse Dependency Parsers , 2005, IWPT.

[9]  Georgios Paliouras,et al.  Combining Information Extraction Systems Using Voting and Stacked Generalization , 2005, J. Mach. Learn. Res..

[10]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[11]  Eric Brill,et al.  Classifier Combination for Improved Lexical Disambiguation , 1998, ACL.

[12]  David Yarowsky,et al.  Modeling Consensus: Classifier Combination for Word Sense Disambiguation , 2002, EMNLP.

[13]  Zornitsa Kozareva,et al.  Combining data-driven systems for improving Named Entity Recognition , 2005, Data Knowl. Eng..

[14]  Xia Wang,et al.  Sentiment Classification through Combining Classifiers with Multiple Feature Sets , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[15]  Yishay Mansour,et al.  Why averaging classifiers can protect against overfitting , 2001, AISTATS.

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[17]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[18]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[19]  K. Arrow Social Choice and Individual Values , 1951 .

[20]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[21]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[22]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[23]  Eric Brill,et al.  Bagging and Boosting a Treebank Parser , 2000, ANLP.

[24]  Tiejun Zhao,et al.  Identifying named entities in biomedical text based on stacked generalization , 2008, 2008 7th World Congress on Intelligent Control and Automation.

[25]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[26]  Hinrich Schütze,et al.  Method combination for document filtering , 1996, SIGIR '96.

[27]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[28]  Michal Wrzeszcz,et al.  Accuracy of Baseline and Complex Methods Applied to Morphosyntactic Tagging of Polish , 2008, ICCS.

[29]  Ching Y. Suen,et al.  A Method of Combining Multiple Experts for the Recognition of Unconstrained Handwritten Numerals , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[31]  Walter Daelemans,et al.  MBT : Memory Based Tagger, version 1.0, Reference Guide , 2002 .

[32]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[33]  Walter Daelemans,et al.  Improving Accuracy in word class tagging through the Combination of Machine Learning Systems , 2001, CL.

[34]  Ted Pedersen,et al.  A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation , 2000, ANLP.

[35]  Josep Carmona,et al.  Improving POS Tagging Using Machine-Learning Techniques , 1999, EMNLP.

[36]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[37]  Hans van Halteren,et al.  Improving Data Driven Wordclass Tagging by System Combination , 1998, ACL.

[38]  Sergei Nirenburg,et al.  Three Heads are Better than One , 1994, ANLP.

[39]  Anh Cuong Le A study of classifier combination and semi-supervised learning for word sense disambiguation , 2007 .