Application of Weighted Voting Taggers to Languages Described with Large Tagsets

Manuscript received 7 July 2008; revised 2 July 2009Communicated by Milan RuskoAbstract. The paper presents baseline and complex part-of-speech taggers appliedto the modified corpus of Frequency Dictionary of Contemporary Polish, annotatedwith a large tagset. First, the paper examines accuracy of 6 baseline part-of-speechtaggers. The main part of the work presents simple weighted voting and complexvoting taggers. Special attention is paid to lexical voting methods and issues ofties and fallbacks. TagPair and WPDV voting methods achieve the top accuracyamong all considered methods. Error reduction 10.8% with respect to the bestbaseline tagger for the large tagset is comparable with other author’s results forsmall tagsets.Keywords: Part-of-speech tagging, combination tagger, weighted probability dis-tribution voting tagger, TagPair taggerMathematics Subject Classification 2000: 68T50, 68T05, 68T35

[1]  Hans van Halteren,et al.  Improving Data Driven Wordclass Tagging by System Combination , 1998, ACL.

[2]  Jacek Kitowski,et al.  A Case Study of Algorithms for Morphosyntactic Tagging of Polish Language , 2007, Comput. Informatics.

[3]  Jan Hajic,et al.  Tagging Inflective Languages: Prediction of Morphological Categories for a Rich Structured Tagset , 1998, ACL.

[4]  Michal Wrzeszcz,et al.  Accuracy of Baseline and Complex Methods Applied to Morphosyntactic Tagging of Polish , 2008, ICCS.

[5]  Eric Brill,et al.  Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging , 1995, VLC@ACL.

[6]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[7]  Ingo Schröder A Case Study in Part-of-Speech Tagging Using the ICOPOST Toolkit , 2002 .

[8]  Eric Brill,et al.  Classifier Combination for Improved Lexical Disambiguation , 1998, ACL.

[9]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[10]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[11]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[12]  Geoffrey Leech,et al.  CLAWS4: The Tagging of the British National Corpus , 1994, COLING.

[13]  Jan Hajic,et al.  Morphological Tagging: Data vs. Dictionaries , 2000, ANLP.

[14]  Rada Mihalcea,et al.  Performance Analysis of a Part of Speech Tagging Task , 2003, CICLing.

[15]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[16]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  Maciej Piasecki,et al.  A Rule-Based Tagger for Polish Based on Genetic Algorithm , 2005, Intelligent Information Systems.

[19]  Helmut Schmid,et al.  Part-of-Speech Tagging With Neural Networks , 1994, COLING.

[20]  Atro Voutilainen A syntax-based part-of-speech analyser , 1995, EACL.

[21]  Jacek Kitowski,et al.  Increasing Quality of the Corpus of Frequency Dictionary of Contemporary Polish for Morphosyntactic Tagging of the Polish Language , 2009, Comput. Informatics.

[22]  Walter Daelemans,et al.  Improving Accuracy in word class tagging through the Combination of Machine Learning Systems , 2001, CL.

[23]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[24]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[25]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[26]  Hans van Halteren,et al.  Weighted probability distribution voting, an introduction , 1999, The Clinician.

[27]  Dan Roth,et al.  Part of Speech Tagging Using a Network of Linear Separators , 1998, ACL.

[28]  Yoram Singer,et al.  Boosting Applied to Tagging and PP Attachment , 1999, EMNLP.