A Unified POS Tagging Architecture and its Application to Greek

This paper proposes a flexible and unified tagging architecture that could be incorporated into a number of applications like information extraction, cross-language information retrieval, term extraction, or summarization, while providing an essential component for subsequent syntactic processing or lexicographical work. A feature-based multi-tiered approach (FBT tagger) is introduced to part-of-speech tagging. FBT is a variant of the well-known transformation based learning paradigm aiming at improving the quality of tagging highly inflective languages such as Greek. Additionally, a large experiment concerning the Greek language is conducted and results are presented for a variety of text genres, including financial reports, newswires, press releases and technical manuals. Finally, the adopted evaluation methodology is discussed.

[1]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, Applied Natural Language Processing Conference.

[2]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[3]  Nelleke Oostdijk,et al.  Corpus Linguistics and the Automatic Analysis of English , 1991 .

[4]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[5]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[6]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[7]  Eric Brill,et al.  Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging , 1995, VLC@ACL.

[8]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[9]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[10]  Atro Voutilainen,et al.  Inducing constraint grammars , 1996, ICGI.

[11]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[12]  Dan Roth,et al.  Part of Speech Tagging Using a Network of Linear Separators , 1998, ACL.

[13]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[14]  Jan Hajic,et al.  Tagging Inflective Languages: Prediction of Morphological Categories for a Rich Structured Tagset , 1998, ACL.

[15]  Dimitris Christodoulakis,et al.  POS Disambiguation and Unknown Word Guessing with Decision Trees , 1999, EACL.

[16]  Georgios Paliouras,et al.  Resolving Part-of-Speech Ambiguity in the Greek Language Using Learning Techniques , 1999, ArXiv.