论文信息 - Maximum entropy models for natural language ambiguity resolution

Maximum entropy models for natural language ambiguity resolution

This thesis demonstrates that several important kinds of natural language ambiguities can be resolved to state-of-the-art accuracies using a single statistical modeling technique based on the principle of maximum entropy. We discuss the problems of sentence boundary detection, part-of-speech tagging, prepositional phrase attachment, natural language parsing, and text categorization under the maximum entropy framework. In practice, we have found that maximum entropy models offer the following advantages: State-of-the-art accuracy. The probability models for all of the tasks discussed perform at or near state-of-the-art accuracies, or outperform competing learning algorithms when trained and tested under similar conditions. Methods which outperform those presented here require much more supervision in the form of additional human involvement or additional supporting resources. Knowledge-poor features. The facts used to model the data, or features, are linguistically very simple, or "knowledge-poor", but yet succeed in approximating complex linguistic relationships. Reusable software technology. The mathematics of the maximum entropy framework are essentially independent of any particular task, and a single software implementation can be used for all of the probability models in this thesis. The experiments in this thesis suggest that experimenters can obtain state-of-the-art accuracies on a wide range of natural language tasks, with little task-specific effort, by using maximum entropy probability models.

Mitchell P. Marcus | Adwait Ratnaparkhi | M. Marcus | A. Ratnaparkhi

[1] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .

[2] I. Good. Maximum Entropy for Hypothesis Formulation, Especially for Multidimensional Contingency Tables , 1963 .

[3] J. Darroch,et al. Generalized Iterative Scaling for Log-Linear Models , 1972 .

[4] I. Csiszár. $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[5] B. Shalit. Structural ambiguity and limits to coping. , 1977, Journal of human stress.

[6] H. Künkel. Frequency analysis. , 1978, Electroencephalography and clinical neurophysiology. Supplement.

[7] Mitchell P. Marcus,et al. A theory of syntactic recognition for natural language , 1979 .

[8] Patrick Henry Winston,et al. A Theory of Syntactic Recognition for Natural Language , 1982 .

[9] W. Nelson Francis,et al. FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[10] Geoffrey Leech,et al. The tagged LOB Corpus : user's manual , 1986 .

[11] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[12] R. Larsen. An introduction to mathematical statistics and its applications / Richard J. Larsen, Morris L. Marx , 1986 .

[13] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[14] I. Csiszár. A geometric interpretation of Darroch and Ratcliff's generalized iterative scaling , 1989 .

[15] Michael Riley,et al. Some Applications of Tree-based Modelling to Speech and Language , 1989, HLT.

[16] David W. Hosmer,et al. Applied Logistic Regression , 1991 .

[17] Kenneth Ward Church. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[18] Steven Abney,et al. Parsing By Chunks , 1991 .

[19] David D. Lewis,et al. Representation and Learning in Information Retrieval , 1991 .

[20] Ralph Grishman,et al. A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[21] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[22] Mats Rooth,et al. Structural Ambiguity and Lexical Relations , 1991, ACL.

[23] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.