The Computational Complexity of Rule-Based Part-of-Speech Tagging
暂无分享,去创建一个
The paper deals with the computational complexity of Part-of-Speech tagging (aka morphological disambiguation) by means of rules derived from loosened negative n-grams. Loosened negative n-grams [2] were originally developed as a tool for the task of pure verification of results of Part-of-Speech tagging (corpus quality checking). It is shown that while the verification is just a polynomial problem, the time consumed by the tagging (disambiguation) task cannot be bounded by a polynom in the general case. The results presented in the paper are relevant above all for disambiguation performed by means of Constraint-based Grammars [1] and similar frameworks, which are in fact only notational variants of the rules derived via loosened negative n-grams. Throughout the paper some familiarity with finite-state automata (FSA) and the class of NP problems is assumed.
[1] Karel Oliva,et al. Achieving an Almost Correct PoS-Tagged Corpus , 2002, TSD.
[2] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[3] Atro Voutilainen,et al. A language-independent system for parsing unrestricted text , 1995 .