Something Borrowed, Something Blue: Rule-based Combination of POS Taggers

Linguistically annotated text resources are still scarce for many languages and for many text types, mainly because their creation represents a major investment of work and time. For this reason, it is worthwhile to investigate ways of reusing existing resources in novel ways. In this paper, we investigate how off-the-shelf part of speech (POS) taggers can be combined to better cope with text material of a type on which they were not trained, and for which there are no readily available training corpora. We indicate—using freely available taggers for German (although the method we describe is not language-dependent)—how such taggers can be combined by using linguistically motivated rules so that the tagging accuracy of the combination exceeds that of the best of the individual taggers.

[1]  Torbjörn Lager The µ-TBL System: Logic Programming Tools for Transformation-Based Learning , 1999, CoNLL.

[2]  Simone Teufel A Support Tool for Tagset Mapping , 1995, ArXiv.

[3]  Lars Borin You'll Take the High Road and I'll Take the Low Road: Using a Third Language to Improve Bilingual Word Alignment , 2000, COLING.

[4]  Kenneth D. B. Williams,et al.  And Never the Twain Shall Meet , 2000, Int. CMG Conference.

[5]  Stig Johansson Towards a multilingual corpus for contrastive analysis and translation studies , 2002 .

[6]  Hans van Halteren,et al.  Improving Data Driven Wordclass Tagging by System Combination , 1998, ACL.

[7]  Llu S Mm Arquez, Llu S Padrr,et al.  Improving Tagging Accuracy by Using Voting Taggers , 1998 .

[8]  Wolfgang Lezius,et al.  A Freely Available Morphological Analyzer, Disambiguator and Context Sensitive Lemmatizer for German , 1998, ACL.

[9]  Klas Prytz Part-of-Speech Tagging for Swedish , 1999 .

[10]  Lluís Padró,et al.  On the Evaluation and Comparison of Taggers: the Effect of Noise in Testing Corpora , 1998, COLING-ACL.

[11]  Lars Borin Alignment and tagging , 2002 .

[12]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[13]  Walter Daelemans,et al.  Improving Data Driven Wordclass Tagging by System Combination , 2022, International Conference on Computational Linguistics.

[14]  Eric Brill,et al.  Classifier Combination for Improved Lexical Disambiguation , 1998, ACL.

[15]  Edith A. Moravcsik,et al.  Parts of speech: A challenge for typology , 1997 .

[16]  Kagan Tumer,et al.  Linear and Order Statistics Combiners for Pattern Classification , 1999, ArXiv.

[17]  Lars Borin Pivot Alignment , 1999, NODALIDA.