Do all fragments count?

We aim at finding the minimal set of fragments that achieves maximal parse accuracy in Data Oriented Parsing (DOP). Experiments with the Penn Wall Street Journal (WSJ) treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over previous models tested on this treebank. We isolate a number of dependency relations which previous models neglect but which contribute to higher accuracy. We show that the history of statistical parsing models displays a tendency towards using more and larger fragments from training data.

[1]  David Chiang,et al.  Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar , 2000, ACL.

[2]  J. V. Santen Exploring N -way tables with sums-of-products models , 1993 .

[3]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[4]  Rens Bod Using an Annotated Corpus as a Stochastic Grammar , 1993, EACL.

[5]  Khalil Sima'an Tree-gram Parsing: Lexical Dependencies and Structural Relations , 2000, ACL.

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[8]  Eric Atwell,et al.  Transforming a parsed corpus into a corpus parser , 1988 .

[9]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[10]  Khalil Simaan,et al.  Computational Complexity of Probabilistic Disambiguation by means of Tree-Grammars , 1996, COLING.

[11]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[12]  Arjen Poutsma Data-Oriented Translation , 2000, COLING.

[13]  Jason Eisner Bilexical Grammars and a Cubic-time Probabilistic Parser , 1997, IWPT.

[14]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[15]  Rens Bod,et al.  Parsing with the Shortest Derivation , 2000, COLING.

[16]  Khalil Sima’an,et al.  An optimised algorithm for data oriented parsing , 1997 .

[17]  Rens Bod,et al.  Two Questions about Data-Oriented Parsing , 1996, VLC@COLING.

[18]  Rens Bod,et al.  A Probabilistic Corpus-Driven Model for Lexical-Functional Analysis , 1998, ACL.

[19]  Rens Bod,et al.  Beyond Grammar: An Experience-Based Theory of Language , 1998 .

[20]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.

[21]  Rens Bod,et al.  A DOP Model for Semantic Interpretation , 1997, ACL.

[22]  Mark Johnson,et al.  Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.

[23]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[24]  Rens Bod An Improved Parser for Data-Oriented Lexical-Functional Analysis , 2000, ACL.

[25]  Richard M. Schwartz,et al.  Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[26]  unter NeumannDFKI,et al.  Learning Stochastic Lexicalized Tree Grammars from Hpsg , 1999 .

[27]  Frederick Jelinek,et al.  Basic Methods of Probabilistic Context Free Grammars , 1992 .

[28]  Joshua Goodman Efficient Algorithms for Parsing the DOP Model , 1996, EMNLP.

[29]  R. Bonnema A New Probability Model for Data Oriented Parsing , 1999 .

[30]  Rens Bod,et al.  A Data-Oriented Approach to Semantic Interpretation , 1996, ArXiv.

[31]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[33]  Jean-Cédric Chappelier,et al.  Monte-Carlo Sampling for NP-Hard Maximization Problems in the Framework of Weighted Parsing , 2000, Natural Language Processing.

[34]  Guy De Pauw,et al.  Aspects of Pattern-matching in Data-Oriented Parsing , 2000, COLING.

[35]  Michael Collins,et al.  Review of Beyond grammar: an experience-based theory of language by Rens Bod. CSLI Publications 1998. , 1999 .

[36]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[37]  Yves Schabes,et al.  Stochastic Lexicalized Tree-adjoining Grammars , 1992, COLING.

[38]  Rens Bod,et al.  A Computational Model of Language Performance: Data Oriented Parsing , 1992, COLING.

[39]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[40]  Andy Way A hybrid architecture for robust MT using LFG-DOP , 1999, J. Exp. Theor. Artif. Intell..

[41]  Rens Bod Combining semantic and syntactic structure for language modeling , 2000, INTERSPEECH.

[42]  Boris Cormons Analyse et desambiguisation : une approche a base de corpus (data-oriented parsing) pour les representations lexicales fonctionnelles , 1999 .

[43]  Rens Bod,et al.  Context-sensitive spoken dialogue processing with the DOP model , 1999, Natural Language Engineering.

[44]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[45]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[46]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[47]  Richard C. Waters,et al.  Stochastic Lexicalized Tree-Insertion Grammar , 1996 .

[48]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[49]  Joshua Goodman,et al.  Global Thresholding and Multiple-Pass Parsing , 1997, EMNLP.

[50]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[51]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[52]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[53]  SchwartzRichard,et al.  Coping with ambiguity and unknown words through probabilistic models , 1993 .

[54]  Rens Bod,et al.  Using an Annotated Language Corpus as a Virtual Stochastic Grammar , 1993, AAAI.

[55]  L.W.M. Bod,et al.  Grammaticality, Robustness and specificity in a Probabilistic Approach to Lexical Functional Analysis , 1998 .