Back-off as Parameter Estimation for DOP models

Data-Oriented Parsing (DOP) is a probabilistic performance approach to parsing natural language. Several DOP models have been proposed since it was introduced by Scha (1990), achieving promising results. One important feature of these models is the probability estimation procedure. Two major estimators have been put forward: Bod (1993) uses a relative frequency estimator; Bonnema (1999) adds a rescaling factor to correct for tree size effects. Both estimators, however, present biases. Moreover, Bod’s estimator has been shown to be inconsistent (Johnson, 2002), meaning that the probability estimates hypothesized by the model do not approach the true probabilities that generated the data as the sample size grows. In this thesis, we implement a new estimation procedure that tackles the shortcomings of the two previous methods. The main idea is to treat derivation events not as disjoint, but as interrelated in a hierarchical cascade of parse tree derivations. We show that this new estimator – called the Back-Off DOP (BO-DOP) estimator – outperforms both previous models. We tested it on the OVIS treebank, a Dutch language, speech-based system, and report error reductions of up to 11.4% and 15% when compared to, respectively, Bod’s and Bonnema’s estimators.

[1]  Vincenzo Lombardo,et al.  Incrementality and Lexicalism: A Treebank Study , 2002 .

[2]  Edith Cohen,et al.  Labeling dynamic XML trees , 2002, SIAM J. Comput..

[3]  Mark Johnson,et al.  Squibs and Discussions: The DOP Estimation Method is Biased and Inconsistent , 2002, CL.

[4]  Rens Bod,et al.  What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy? , 2001, ACL.

[5]  K. Sima'an Tree-gram Parsing: Lexical Dependencies and Structural Relations , 2000, ACL.

[6]  Rens Bod,et al.  Parsing with the Shortest Derivation , 2000, COLING.

[7]  Thorsten Brants,et al.  Probabilistic Parsing and Psychological Plausibility , 2000, COLING.

[8]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[9]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10]  Khalil Sima'an,et al.  Learning Efficient Disambiguation , 1999, ArXiv.

[11]  Martin J. Pickering,et al.  The rational of analysis of inquiry: The case of parsing. , 1998 .

[12]  Rens Bod,et al.  A Probabilistic Corpus-Driven Model for Lexical-Functional Analysis , 1998, ACL.

[13]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.

[14]  David J. Weir,et al.  Encoding Frequency Information in Lexicalized Grammars , 1997, IWPT.

[15]  Rens Bod,et al.  A DOP Model for Semantic Interpretation , 1997, ACL.

[16]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[17]  Nick Chater,et al.  Reconciling simplicity and likelihood principles in perceptual organization. , 1996, Psychological review.

[18]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[19]  Rens Bod,et al.  Two Questions about Data-Oriented Parsing , 1996, VLC@COLING.

[20]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[21]  Richard M. Schwartz,et al.  Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[22]  Rens Bod,et al.  Using an Annotated Corpus as a Stochastic Grammar , 1993, EACL.

[23]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[24]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[25]  WILLIAM MARSLEN-WILSON,et al.  Linguistic Structure and Speech Shadowing at Very Short Latencies , 1973, Nature.

[26]  K. Sima'an,et al.  Enhancing the Robustness of Data Oriented Parsing of Speech-Understanding , 2001 .

[27]  L. Hoogweg,et al.  Extending DOP1 with the Insertion Operation , 2000 .

[28]  Vijay K. Shanker,et al.  Automated Extraction of TAGs from the Penn Treebank , 2000, IWPT.

[29]  R. Bonnema A New Probability Model for Data Oriented Parsing , 1999 .

[30]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[31]  Matthew W. Crocker,et al.  Mechanisms for Sentence Processing , 1996 .

[32]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[33]  Kenneth Ward Church,et al.  Coping with Syntactic Ambiguity or How to Put the Block in the Box on the Table , 1982, CL.

[34]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[35]  Steven Abney,et al.  The English Noun Phrase in its Sentential Aspect , 1972 .

[36]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .