论文信息 - Computational Complexity of Probabilistic Disambiguation

Computational Complexity of Probabilistic Disambiguation

Recent models of natural language processing employ statistical reasoning for dealing with the ambiguity of formal grammars. In this approach, statistics, concerning the various linguistic phenomena of interest, are gathered from actual linguistic data and used to estimate the probabilities of the various entities that are generated by a given grammar, e.g., derivations, parse-trees and sentences. The extension of grammars with probabilities makes it possible to state ambiguity resolution as a constrained optimization formula, which aims at maximizing the probability of some entity that the grammar generates given the input (e.g., maximum probability parse-tree given some input sentence). The implementation of these optimization formulae in efficient algorithms, however, does not always proceed smoothly. In this paper, we address the computational complexity of ambiguity resolution under various kinds of probabilistic models. We provide proofs that some, frequently occurring problems of ambiguity resolution are NP-complete. These problems are encountered in various applications, e.g., language understanding for text- and speech-based applications. Assuming the common model of computation, this result implies that, for many existing probabilistic models it is not possible to devise tractable algorithms for solving these optimization problems.

KHALIL SIMA’AN

[1] John D. Lafferty,et al. Decision Tree Parsing using a Hidden Derivation Model , 1994, HLT.

[2] Adwait Ratnaparkhi,et al. A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[3] Joshua Goodman,et al. Parsing Inside-Out , 1998, ArXiv.

[4] Rens Bod,et al. A Computational Model of Language Performance: Data Oriented Parsing , 1992, COLING.

[5] Eugene Charniak,et al. Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[6] Robert C. Berwick,et al. Computational complexity and natural language , 1987 .

[7] Ralph Grishman,et al. A Corpus-based Probabilistic Grammar with Only Two Non-terminals , 1995, IWPT.

[8] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[9] Arto Salomaa,et al. Probabilistic and Weighted Grammars , 1969, Inf. Control..

[10] Noam Chomsky,et al. वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[11] Michael Collins,et al. Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.