The Computational Complexity of Rule-Based Part-of-Speech Tagging

The paper deals with the computational complexity of Part-of-Speech tagging (aka morphological disambiguation) by means of rules derived from loosened negative n-grams. Loosened negative n-grams [2] were originally developed as a tool for the task of pure verification of results of Part-of-Speech tagging (corpus quality checking). It is shown that while the verification is just a polynomial problem, the time consumed by the tagging (disambiguation) task cannot be bounded by a polynom in the general case. The results presented in the paper are relevant above all for disambiguation performed by means of Constraint-based Grammars [1] and similar frameworks, which are in fact only notational variants of the rules derived via loosened negative n-grams. Throughout the paper some familiarity with finite-state automata (FSA) and the class of NP problems is assumed.