论文信息 - Robust ending guessing rules with application to slavonic languages

Robust ending guessing rules with application to slavonic languages

The paper studies the automatic extraction of diagnostic word endings for Slavonic languages aimed to determine some grammatical, morphological and semantic properties of the underlying word. In particular, ending guessing rules are being learned from a large morphological dictionary of Bulgarian in order to predict POS, gender, number, article and semantics. A simple exact high accuracy algorithm is developed and compared to an approximate one, which uses a scoring function previously proposed by Mikheev for POS guessing. It is shown how the number of rules of the latter can be reduced by a factor of up to 35, without sacrificing performance. The evaluation demonstrates coverage close to 100%, and precision of 97--99% for the approximate algorithm.

Preslav Nakov | Elena Paskaleva

[1] Mary P. Harper,et al. Analysis of Unknown Lexical Items using Morphological and Syntactic Information with the TIMIT Corpus , 1997, VLC.

[2] Eric Gaussier,et al. Unsupervised learning of derivational morphology from inflectional lexicons , 1999 .

[3] Walter Daelemans,et al. Memory-Based Morphological Analysis , 1999, ACL.

[4] Helmut Schmid,et al. Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[5] David Yarowsky,et al. Language Independent, Minimally Supervised Induction of Lexical Probabilities , 2000, ACL.

[6] David Yarowsky,et al. Minimally Supervised Morphological Analysis by Multimodal Alignment , 2000, ACL.

[7] E. Paskaleva. Compilation and validation of morphological resources ( overview of the morphology cooking technologies ) , 2003 .

[8] Christian Jacquemin,et al. Guessing morphology from terms and corpora , 1997, SIGIR '97.

[9] Robert H. Baud,et al. Minimal Commitment and Full Lexical Disambiguation: Balancing Rules and Hidden Markov Models , 2000, CoNLL/LLL.

[10] Richard M. Schwartz,et al. Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[11] Hervé Déjean. Morphemes as Necessary Concept for Structures Discovery from Untagged Corpora , 1998, CoNLL.