What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy?

We aim at finding the minimal set of fragments which achieves maximal parse accuracy in Data Oriented Parsing. Experiments with the Penn Wall Street Journal treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over previous models tested on this treebank (a precision of 90.8% and a recall of 90.6%). We isolate some dependency relations which previous models neglect but which contribute to higher parse accuracy.

[1]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[2]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[3]  Rens Bod,et al.  Beyond Grammar: An Experience-Based Theory of Language , 1998 .

[4]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[5]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[6]  Yves Schabes,et al.  Stochastic Lexicalized Tree-adjoining Grammars , 1992, COLING.

[7]  Rens Bod Combining semantic and syntactic structure for language modeling , 2000, INTERSPEECH.

[8]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.

[9]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[10]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[11]  Richard M. Schwartz,et al.  Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[12]  Rens Bod,et al.  Using an Annotated Language Corpus as a Virtual Stochastic Grammar , 1993, AAAI.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Jason Eisner Bilexical Grammars and a Cubic-time Probabilistic Parser , 1997, IWPT.

[15]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[16]  Rens Bod,et al.  A Probabilistic Corpus-Driven Model for Lexical-Functional Analysis , 1998, ACL.

[17]  Khalil Sima'an,et al.  Learning Efficient Disambiguation , 1999, ArXiv.

[18]  David Chiang,et al.  Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar , 2000, ACL.

[19]  Rens Bod,et al.  Parsing with the Shortest Derivation , 2000, COLING.

[20]  Rens Bod,et al.  Two Questions about Data-Oriented Parsing , 1996, VLC@COLING.

[21]  Khalil Sima'an,et al.  Data-Oriented Parsing , 2003 .

[22]  Khalil Sima'an Tree-gram Parsing: Lexical Dependencies and Structural Relations , 2000, ACL.

[23]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[24]  Rens Bod Using natural language processing techniques for musical parsing , 2001 .

[25]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[26]  SchwartzRichard,et al.  Coping with ambiguity and unknown words through probabilistic models , 1993 .

[27]  David J. Weir,et al.  Encoding Frequency Information in Lexicalized Grammars , 1997, IWPT.

[28]  Richard C. Waters,et al.  Stochastic Lexicalized Tree-Insertion Grammar , 1996 .

[29]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[30]  Joshua Goodman,et al.  Global Thresholding and Multiple-Pass Parsing , 1997, EMNLP.

[31]  Joshua Goodman Efficient Algorithms for Parsing the DOP Model , 1996, EMNLP.

[32]  R. Bonnema A New Probability Model for Data Oriented Parsing , 1999 .

[33]  Rens Bod An Improved Parser for Data-Oriented Lexical-Functional Analysis , 2000, ACL.

[34]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.