论文信息 - Back-off as Parameter Estimation for DOP models

Back-off as Parameter Estimation for DOP models

Data-Oriented Parsing (DOP) is a probabilistic performance approach to parsing natural language. Several DOP models have been proposed since it was introduced by Scha (1990), achieving promising results. One important feature of these models is the probability estimation procedure. Two major estimators have been put forward: Bod (1993) uses a relative frequency estimator; Bonnema (1999) adds a rescaling factor to correct for tree size effects. Both estimators, however, present biases. Moreover, Bod’s estimator has been shown to be inconsistent (Johnson, 2002), meaning that the probability estimates hypothesized by the model do not approach the true probabilities that generated the data as the sample size grows. In this thesis, we implement a new estimation procedure that tackles the shortcomings of the two previous methods. The main idea is to treat derivation events not as disjoint, but as interrelated in a hierarchical cascade of parse tree derivations. We show that this new estimator – called the Back-Off DOP (BO-DOP) estimator – outperforms both previous models. We tested it on the OVIS treebank, a Dutch language, speech-based system, and report error reductions of up to 11.4% and 15% when compared to, respectively, Bod’s and Bonnema’s estimators.

L. Buratto

[1] Vincenzo Lombardo,et al. Incrementality and Lexicalism: A Treebank Study , 2002 .

[2] Edith Cohen,et al. Labeling dynamic XML trees , 2002, SIAM J. Comput..

[3] Mark Johnson,et al. Squibs and Discussions: The DOP Estimation Method is Biased and Inconsistent , 2002, CL.

[4] Rens Bod,et al. What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy? , 2001, ACL.

[5] K. Sima'an. Tree-gram Parsing: Lexical Dependencies and Structural Relations , 2000, ACL.

[6] Rens Bod,et al. Parsing with the Shortest Derivation , 2000, COLING.

[7] Thorsten Brants,et al. Probabilistic Parsing and Psychological Plausibility , 2000, COLING.

[8] Eugene Charniak,et al. A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[9] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10] Khalil Sima'an,et al. Learning Efficient Disambiguation , 1999, ArXiv.

[11] Martin J. Pickering,et al. The rational of analysis of inquiry: The case of parsing. , 1998 .