论文信息 - Czech Morphological Tagset Revisited

Czech Morphological Tagset Revisited

Lot of natural language processing is built on top of some solid morphological annotation. In this paper we present an update of the Czech morphological tagset as given by the analyzer Ajka that has been used for academic as well as commercial purposes for more than dozen years. The revision reacts on rather practical issues that we had to face during development of subsequent tools for NLP, parsers in the first place. We describe the reasoning behind each of the changes and include the full updated tagset reference manual. Finally we provide a comparison and mapping to the Universal tagset as produced by Google.

Vojtech Kovár | Milos Jakubícek | Pavel Smerk

[1] Pavel Smerk. Fast Morphological Analysis of Czech , 2009, RASLAN.

[2] Karel Pala,et al. DESAM - Annotated Corpus for Czech , 1997, SOFSEM.

[3] Slav Petrov,et al. A Universal Part-of-Speech Tagset , 2011, LREC.