论文信息 - A Hybrid Approach to Part-of-Speech Tagging

A Hybrid Approach to Part-of-Speech Tagging

Part-of-Speech (PoS) Tagging – the automatic annotation of lexical categories – is a widely used early stage of linguistic text analysis. One approach, rule-based morphological anaylsis, employs linguistic knowledge in the form of hand-coded rules to derive a set of possible analyses for each input token, but is known to produce highly ambiguous results. Stochastic tagging techniques such as Hidden Markov Models (HMMs) make use of both lexical and bigram probabilities estimated from a tagged training corpus in order to compute the most likely PoS tag sequence for each input sentence, but provide no allowance for prior linguistic knowledge. In this report, I describe the dwdst PoS tagging library, which makes use of a rule-based morphological component to extend traditional HMM techniques by the inclusion of lexical class probabilities and theoretically motivated search space reduction.

Bryan Jurish | Bryan Jurish

[1] Alfred V. Aho,et al. The Theory of Parsing, Translation, and Compiling , 1972 .

[2] Alex Waibel,et al. The Automatic Acquisition of Frequencies of Verb Subcategorization Frames from Tagged Corpora , 2002 .

[3] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4] David B. Pisoni,et al. Text-to-speech: the mitalk system , 1987 .

[5] Wojciech Skut,et al. An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[6] Kimmo Koskenniemi,et al. A General Computational Model for Word-Form Recognition and Production , 1984 .

[7] Kimmo Koskenniemi,et al. A Compiler for Two-level Phonological Rules , 1987 .

[8] Christopher D. Manning. Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[9] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[10] Eric Brill,et al. Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[11] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.