Implementing an Efficient Part-Of-Speech Tagger

An efficient implementation of a part-of-speech tagger for S wedish is described. The stochastic tagger uses a well-established Markov model of the language. The tagger tags 92% of unknown words correctly and up to 97% of all words. Several implementation and optimization considerations are discu s ed. The main contribution of this paper is the thorough descript ion of the tagging algorithm and the addition of a number of improvements. The p aper contains enough detail for the reader to construct a tagger for his own language.

[1]  Rickard Domeij,et al.  Detection of Spelling Errors in Swedish Not Using a Word List En Clair , 1994, J. Quant. Linguistics.

[2]  Pasi Tapanainen,et al.  What is a word, What is a sentence? Problems of Tokenization , 1994 .

[3]  James A. Anderson,et al.  Syntactic category disambiguation with neural networks , 1989 .

[4]  Rickard Domeij,et al.  Implementation Aspects and Applications of a Spelling Correction Algorithm , 1998 .

[5]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[6]  Magnus Åström A Probabilistic Tagger for Swedish Using the SUC Tagset , 1996 .

[7]  Atro Voutilainen,et al.  Tagging accurately - Don't guess if you know , 1994, ANLP.

[8]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[9]  Steven J. DeRose,et al.  Grammatical Category Disambiguation by Statistical Optimization , 1988, CL.

[10]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[11]  George K. Kokkinakis,et al.  Automatic Stochastic Tagging of Natural Language Texts , 1995, Comput. Linguistics.

[12]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[13]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[14]  William H. Press,et al.  Numerical recipes , 1990 .

[15]  Atro Voutilainen A syntax-based part-of-speech analyser , 1995, EACL.

[16]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[17]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .