论文信息 - The Computational Complexity of Rule-Based Part-of-Speech Tagging

The Computational Complexity of Rule-Based Part-of-Speech Tagging

The paper deals with the computational complexity of Part-of-Speech tagging (aka morphological disambiguation) by means of rules derived from loosened negative n-grams. Loosened negative n-grams [2] were originally developed as a tool for the task of pure verification of results of Part-of-Speech tagging (corpus quality checking). It is shown that while the verification is just a polynomial problem, the time consumed by the tagging (disambiguation) task cannot be bounded by a polynom in the general case. The results presented in the paper are relevant above all for disambiguation performed by means of Constraint-based Grammars [1] and similar frameworks, which are in fact only notational variants of the rules derived via loosened negative n-grams. Throughout the paper some familiarity with finite-state automata (FSA) and the class of NP problems is assumed.

Karel Oliva | Pavel Kveton | Roman Ondruska

[1] Karel Oliva,et al. Achieving an Almost Correct PoS-Tagged Corpus , 2002, TSD.

[2] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[3] Atro Voutilainen,et al. A language-independent system for parsing unrestricted text , 1995 .