The Best of Two Worlds: Cooperation of Statistical and Rule-Based Taggers for Czech

Several hybrid disambiguation methods are described which combine the strength of hand-written disambiguation rules and statistical taggers. Three different statistical (HMM, Maximum-Entropy and Averaged Perceptron) taggers are used in a tagging experiment using Prague Dependency Tree-bank. The results of the hybrid systems are better than any other method tried for Czech tagging so far.

[1]  George L. Claflen "SOMETHING BORROWED, SOMETHING BLUE" , 2005 .

[2]  O. Morgenthaler,et al.  Proceedings of the Conference , 1930 .

[3]  Jan Hajic,et al.  Serial Combination of Rules and Statistics: A Case Study in Czech Tagging , 2001, ACL.

[4]  Atro Voutilainen,et al.  Tagging accurately - Don't guess if you know , 1994, ANLP.

[5]  Drahomíra johanka Spoustová Combining Statistical and Rule-Based Approaches to Morphological Tagging of Czech Texts , 2008, Prague Bull. Math. Linguistics.

[6]  Noah A. Smith,et al.  Context-Based Morphological Disambiguation with Random Fields , 2005, HLT.

[7]  Gökhan Tür,et al.  Morphological Disambiguation by Voting Constraints , 1997, ACL.

[8]  Eric Brill,et al.  Classifier Combination for Improved Lexical Disambiguation , 1998, ACL.

[9]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[10]  Kimmo Koskenniemi Finite-State Parsing And Disambiguation , 1990, COLING.

[11]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[12]  Pavel Květoň Rule-Based Morphological Disambiguation , 2006 .

[13]  Jean-Pierre Chanod,et al.  Tagging French - comparing a statistical and a constraint-based method , 1995, EACL.

[14]  Pavel Krbec,et al.  Language Modeling for Speech Recognition of Czech , 2006 .

[15]  J. Votrubec Morphological Tagging Based on Averaged Perceptron , 2006 .

[16]  Jan Hajic,et al.  Morphological Tagging: Data vs. Dictionaries , 2000, ANLP.

[17]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[18]  Jakub Zavrel,et al.  Morphosyntactic Tagging of Slovene: Evaluating Taggers and Tagsets , 2000, LREC.

[19]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[20]  Atro Voutilainen,et al.  Comparing a Linguistic and a Stochastic Tagger , 1997, ACL.

[21]  Pavel Kveton Rule-based morphological disambiguation: On computational complexity of the LanGR formalism , 2006, Prague Bull. Math. Linguistics.

[22]  Lars Borin Something Borrowed, Something Blue: Rule-based Combination of POS Taggers , 2000, LREC.

[23]  Milena Hnátková,et al.  The Linguistic Basis of a Rule-Based Tagger of Czech , 2000, TSD.