An Empirical Comparison of Probability Models for Dependency Grammar

This technical report is an appendix to Eisner (1996): it gives superior experimental results that were reported only in the talk version of that paper, with details of how the results were obtained. Eisner (1996) trained three probability models on a small set of about 4,000 conjunction-free, dependencygrammar parses derived from the Wall Street Journal section of the Penn Treebank, and then evaluated the models on a held-out test set, using a novel O(n 3 ) parsing algorithm. The present paper describes some details of the experiments and repeats them with a larger training set of 25,000 sentences. As reported at the talk, the more extensive training yields greatly improved performance, cutting in half the error rate of Eisner (1996). Nearly half the sentences are parsed with no misattachments; two-thirds of sentences are parsed with at most one misattachment. Of the models described in the original paper, the best score is obtained with the generative “model C,” which attaches 87–88% of all words to the correct parent. However, better models are also explored, in particular, two simple variants on the comprehension “model B.” The better of these has an attachment accuracy of 90%, and (unlike model C) tags words more accurately than the comparable trigram tagger. If tags are roughly known in advance, search error is all but eliminated and the new model attains an attachment accuracy of 93%. We find that the parser of Collins (1996), when combined with a highlytrained tagger, also achieves 93% when trained and tested on the same sentences. We briefly discuss the similarities and differences between Collins’s model and ours, pointing out the strengths of each and noting that these strengths could be combined for either dependency parsing or phrase-structure parsing.

[1]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[2]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[3]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[4]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[5]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[6]  Srinivas Bangalore,et al.  The Institute For Research In Cognitive Science Disambiguation of Super Parts of Speech ( or Supertags ) : Almost Parsing by Aravind , 1995 .

[7]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[8]  Dekang Lin,et al.  A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[9]  Michael Collins,et al.  Prepositional Phrase Attachment through a Backed-off Model , 1995, VLC@ACL.

[10]  Eugene Charniak,et al.  Figures of Merit for Best-First Probabilistic Chart Parsing , 1998, Comput. Linguistics.

[11]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[12]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[13]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[14]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[15]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.