Implementing an efficient part‐of‐speech tagger

An efficient implementation of a part‐of‐speech tagger for Swedish is described. The stochastic tagger uses a well‐established Markov model of the language. The tagger tags 92 per cent of unknown words correctly and up to 97 per cent of all words. Several implementation and optimization considerations are discussed. The main contribution of this paper is the thorough description of the tagging algorithm and the addition of a number of improvements. The paper contains enough detail for the reader to construct a tagger for his own language. Copyright © 1999 John Wiley & Sons, Ltd.

[1]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[2]  Gunnar Eriksson,et al.  The Linguistic Annotation System of the Stockholm - Umea , 1993, EACL.

[3]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[4]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[5]  Magnus Åström A Probabilistic Tagger for Swedish Using the SUC Tagset , 1996 .

[6]  Atro Voutilainen,et al.  Tagging accurately - Don't guess if you know , 1994, ANLP.

[7]  James A. Anderson,et al.  Syntactic category disambiguation with neural networks , 1989 .

[8]  H. Sinclair,et al.  What is a Word , 1974 .

[9]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[10]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[11]  Steven J. DeRose,et al.  Grammatical Category Disambiguation by Statistical Optimization , 1988, CL.

[12]  George K. Kokkinakis,et al.  Automatic Stochastic Tagging of Natural Language Texts , 1995, Comput. Linguistics.

[13]  Rickard Domeij,et al.  Detection of Spelling Errors in Swedish Not Using a Word List En Clair , 1994, J. Quant. Linguistics.

[14]  William H. Press,et al.  Numerical recipes , 1990 .

[15]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[16]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[17]  Atro Voutilainen A syntax-based part-of-speech analyser , 1995, EACL.

[18]  Atro Voutilainen,et al.  Comparing a Linguistic and a Stochastic Tagger , 1997, ACL.