Natural Language Parsing as Statistical Pattern Recognition

Traditional natural language parsers are based on rewrite rule systems developed in an arduous, time-consuming manner by grammarians. A majority of the grammarian's efforts are devoted to the disambiguation process, first hypothesizing rules which dictate constituent categories and relationships among words in ambiguous sentences, and then seeking exceptions and corrections to these rules. In this work, I propose an automatic method for acquiring a statistical parser from a set of parsed sentences which takes advantage of some initial linguistic input, but avoids the pitfalls of the iterative and seemingly endless grammar development process. Based on distributionally-derived and linguistically-based features of language, this parser acquires a set of statistical decision trees which assign a probability distribution on the space of parse trees given the input sentence. These decision trees take advantage of significant amount of contextual information, potentially including all of the lexical information in the sentence, to produce highly accurate statistical models of the disambiguation process. By basing the disambiguation criteria selection on entropy reduction rather than human intuition, this parser development method is able to consider more sentences than a human grammarian can when making individual disambiguation rules. In experiments between a parser, acquired using this statistical framework, and a grammarian's rule-based parser, developed over a ten-year period, both using the same training material and test sentences, the decision tree parser significantly outperformed the grammar-based parser on the accuracy measure which the grammarian was trying to maximize, achieving an accuracy of 78% compared to the grammar-based parser's 69%.

[1]  R. Darnell Translation , 1873, The Indian medical gazette.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  K. Gehrkens Efficiency , 1935, Industry, Innovation and Infrastructure.

[4]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[5]  Bert F. Green,et al.  Baseball: an automatic question-answerer , 1899, IRE-AIEE-ACM '61 (Western).

[6]  B. Raphael SIR: A COMPUTER PROGRAM FOR SEMANTIC INFORMATION RETRIEVAL , 1964 .

[7]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[8]  Daniel G. Bobrow,et al.  Natural Language Input for a Computer Problem Solving System , 1964 .

[9]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[10]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[11]  F. Jelinek Fast sequential decoding algorithm using a stack , 1969 .

[12]  William A. Woods,et al.  Computational Linguistics Transition Network Grammars for Natural Language Analysis , 2022 .

[13]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[14]  Terry Winograd,et al.  Understanding natural language , 1974 .

[15]  Allen Newell,et al.  Speech understanding systems : Final report of a study group , 1973 .

[16]  Vaughan R. Pratt,et al.  A Linguistics Oriented Programming Language , 1973, IJCAI.

[17]  Allen Newell,et al.  Harpy, production systems and human cognition , 1978 .

[18]  Naomi Sager,et al.  Natural language information processing , 1980 .

[19]  David H. D. Warren,et al.  Definite Clause Grammars for Language Analysis - A Survey of the Formalism and a Comparison with Augmented Transition Networks , 1980, Artif. Intell..

[20]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[21]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Martin Kay,et al.  Parsing in functional unification grammar , 1986 .

[23]  A.-M. Derouault,et al.  Probabilistic grammar for phonetic to French transcription , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[25]  Peter Sells,et al.  Lectures on contemporary syntactic theories , 1985 .

[26]  Stuart M. Shieber,et al.  An Introduction to Unification-Based Approaches to Grammar , 1986, CSLI Lecture Notes.

[27]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[28]  Michael A. Arbib,et al.  An Introduction to Formal Language Theory , 1988, Texts and Monographs in Computer Science.

[29]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[30]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[31]  Victor Zue,et al.  The VOYAGER Speech Understanding System: A Progress Report , 1989, HLT.

[32]  Marcia C. Linebarger,et al.  The PUNDIT natural-language processing system , 1989, [1989] Proceedings. The Annual AI Systems in Government Conference.

[33]  Douglas B. Paul Algorithms for an optimal A search and linearizing the search in the stack decoder , 1990 .

[34]  R. A. Sharman,et al.  Generating a grammar for statistical training , 1990, HLT.

[35]  James K. Baker,et al.  Stochastic modeling for automatic speech understanding , 1990 .

[36]  Mitchell P. Marcus,et al.  Parsing a Natural Language Using Mutual Information Statistics , 1990, AAAI.

[37]  Douglas B. Paul,et al.  Algorithms for an Optimal A* Search and Linearizing the Search in the Stack Decoder* , 1991, HLT.

[38]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[39]  Jerry R. Hobbs SRI International: description of the TACITUS system as used for MUC-3 , 1991, MUC.

[40]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[41]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[42]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[43]  Julian Kupiec A Trellis-Based Algorithm For Estimating The Parameters Of Hidden Stochastic Context-Free Grammar , 1991, HLT.

[44]  Mitchell P. Marcus,et al.  Pearl: A Probabilistic Chart Parser , 1991, EACL.

[45]  Richard M. Schwartz,et al.  Studies in Part of Speech Labelling , 1991, HLT.

[46]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[47]  Philip Resnik,et al.  WordNet and Distributional Analysis: A Class-based Approach to Lexical Discovery , 1992, AAAI 1992.

[48]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[49]  Lynette Hirschman,et al.  Multi-Site Data Collection for a Spoken Language Corpus , 1992, HLT.

[50]  Ralph M. Weischedel,et al.  A New Approach to Text Understanding , 1992, HLT.

[51]  John D. Lafferty,et al.  Towards History-based Grammars: Using Richer Models for Probabilistic Parsing , 1993, ACL.

[52]  Douglas E. Appelt,et al.  FASTUS: A System for Extracting Information from Natural-Language Text , 1992 .

[53]  John D. Lafferty,et al.  Development and Evaluation of a Broad-Coverage Probabilistic Grammar of English-Language Computer Manuals , 1992, ACL.

[54]  David M. Magerman,et al.  Efficiency, Robustness and Accuracy in Picky Chart Parsing , 1992, ACL.

[55]  Eric Brill,et al.  Transformation-Based Error-Driven Parsing , 1993, IWPT.

[56]  Richard C. Waters,et al.  Stochastic Lexicalized Context-Free Grammar , 1993, IWPT.

[57]  G. Leech,et al.  Statistically-driven computer grammars of English : the IBM/LANCASTER approach , 1993 .

[58]  Hinrich Schütze,et al.  Distributed syntactic representations with an application to part-of-speech tagging , 1993, ICNN.

[59]  Rens Bod Monte Carlo Parsing , 1993, IWPT.