Head-Driven Statistical Models for Natural Language Parsing

This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models.

[1]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[2]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[3]  David H. D. Warren,et al.  Definite Clause Grammars for Language Analysis - A Survey of the Formalism and a Comparison with Augmented Transition Networks , 1980, Artif. Intell..

[4]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[5]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[6]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[7]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[8]  Philip Resnik,et al.  Probabilistic Tree-Adjoining Grammar as a Framework for Statistical Natural Language Processing , 1992, COLING.

[9]  Jason Eisner,et al.  A Probabilistic Parser Applied to Software Testing Documents , 1992, AAAI.

[10]  Jeremy J. Carroll,et al.  Automatic Learning for Semantic Collocation , 1992, ANLP.

[11]  Mark Alan Jones,et al.  A Probabilistic Parser and Its Application , 1992 .

[12]  John Lafferty,et al.  Grammatical Trigrams: A Probabilistic Model of Link Grammar , 1992 .

[13]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[14]  John D. Lafferty,et al.  Towards History-based Grammars: Using Richer Models for Probabilistic Parsing , 1993, ACL.

[15]  Yves Schabes,et al.  Stochastic Lexicalized Tree-adjoining Grammars , 1992, COLING.

[16]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[17]  Richard C. Waters,et al.  Stochastic Lexicalized Context-Free Grammar , 1993, IWPT.

[18]  Eric Brill,et al.  Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach , 1993, ACL.

[19]  Srinivas Bangalore,et al.  The Institute For Research In Cognitive Science Disambiguation of Super Parts of Speech ( or Supertags ) : Almost Parsing by Aravind , 1995 .

[20]  Hiyan Alshawi,et al.  Training and Scaling Preference Functions for Disambiguation , 1994, Comput. Linguistics.

[21]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[22]  John D. Lafferty,et al.  Decision Tree Parsing using a Hidden Derivation Model , 1994, HLT.

[23]  S. Pinker The Language Instinct , 1994 .

[24]  Michael Collins,et al.  Prepositional Phrase Attachment through a Backed-off Model , 1995, VLC@ACL.

[25]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[26]  Hiyan Alshawi,et al.  Head Automata and Bilingual Tiling: Translation with Minimal Representations , 1996, ACL.

[27]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[28]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[29]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[30]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[31]  Joshua Goodman,et al.  Probabilistic Feature Grammars , 1997, IWPT.

[32]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[33]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[34]  Raymond J. Mooney,et al.  Learning Parse and Translation Decisions from Examples with Rich Context , 1997, ACL.

[35]  Jason Eisner,et al.  An Empirical Comparison of Probability Models for Dependency Grammar , 1997, ArXiv.

[36]  C. J. McGrath,et al.  Effect of exchange rate return on volatility spill-over across trading regions , 2012 .

[37]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[38]  Mark Johnson The effect of alternative tree epresentatmns on tree bank grammars , 1998, CoNLL.

[39]  Frederick Jelinek,et al.  Exploiting Syntactic Structure for Language Modeling , 1998, ACL.

[40]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[41]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[42]  Giorgio Satta,et al.  Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars , 1999, ACL.

[43]  C. de Marcken On the Unsupervised Induction of Phrase-Structure Grammars , 1999 .

[44]  Daniel M. Bikel A Statistical Model for Parsing and Word-Sense Disambiguation , 2000, EMNLP.

[45]  David Chiang,et al.  Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar , 2000, ACL.

[46]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[47]  Eugene Charniak,et al.  Assigning Function Tags to Parsed Text , 2000, ANLP.

[48]  Scott Miller,et al.  A Novel Use of Statistical Parsing to Extract Information from Text , 2000, ANLP.

[49]  Michael Collins,et al.  Parameter Estimation for Statistical Parsing Models: Theory and Practice of , 2001, IWPT.

[50]  Eugene Charniak,et al.  Immediate-Head Parsing for Language Models , 2001, ACL.

[51]  Rens Bod What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy? , 2001, ACL.

[52]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[53]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[54]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[55]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[56]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[57]  Jason Eisner,et al.  Transformational Priors Over Grammars , 2002, EMNLP.

[58]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.