Data and models for statistical parsing with combinatory categorial grammar

This dissertation is concerned with the creation of training data and the development of probability models for statistical parsing of English with Combinatory Categorial Grammar (CCG). Parsing, or syntactic analysis, is a prerequisite for semantic interpretation, and forms therefore an integral part of any system which requires natural language understanding. Since almost all naturally occurring sentences are ambiguous, it is not sufficient (and often impossible) to generate all possible syntactic analyses. Instead, the parser needs to rank competing analyses and select only the most likely ones. A statistical parser uses a probability model to perform this task. I propose a number of ways in which such probability models can be defined for CCG. The kinds of models developed in this dissertation, generative models over normal-form derivation trees, are particularly simple, and have the further property of restricting the set of syntactic analyses to those corresponding to a canonical derivation structure. This is important to guarantee that parsing can be done efficiently. In order to achieve high parsing accuracy, a large corpus of annotated data is required to estimate the parameters of the probability models. Most existing wide-coverage statistical parsers use models of phrase-structure trees estimated from the Penn Treebank, a 1-million-word corpus of manually annotated sentences from the Wall Street Journal. This dissertation presents an algorithm which translates the phrase-structure analyses of the Penn Treebank to CCG derivations. The resulting corpus, CCGbank, is used to train and test the models proposed in this dissertation. Experimental results indicate that parsing accuracy (when evaluated according to a comparable metric, the recovery of unlabelled word-word dependency relations), is as high as that of standard Penn Treebank parsers which use similar modelling techniques. Most existing wide-coverage statistical parsers use simple phrase-structure grammars whose syntactic analyses fail to capture long-range dependencies, and therefore do not correspond to directly interpretable semantic representations. By contrast, CCG is a grammar formalism in which semantic representations that include long-range dependencies can be built directly during the derivation of syntactic structure. These dependencies define the predicate-argument structure of a sentence, and are used for two purposes in this dissertation: First, the performance of the parser can be evaluated according to how well it recovers these dependencies. In contrast to purely syntactic evaluations, this yields a direct measure of how accurate the semantic interpretations returned by the parser are. Second, I propose a generative model that captures the local and non-local dependencies in the predicate-argument structure, and investigate the impact of modelling non-local in addition to local dependencies.

[1]  Y. Bar-Hillel A Quasi-Arithmetical Notation for Syntactic Description , 1953 .

[2]  J. Lambek The Mathematics of Sentence Structure , 1958 .

[3]  Haskell B. Curry,et al.  Combinatory Logic, Volume I , 1959 .

[4]  T. E. Harris,et al.  The Theory of Branching Processes. , 1963 .

[5]  Yehoshua Bar-Hillel,et al.  Language and Information , 1964 .

[6]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[7]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[8]  John Robert Ross,et al.  Constraints on variables in syntax , 1967 .

[9]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[10]  Aravind K. Joshi,et al.  Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[11]  J. Baker Trainable grammars for speech recognition , 1979 .

[12]  M. Baltin,et al.  The Mental representation of grammatical relations , 1985 .

[13]  Kent Barrows Wittenburg,et al.  Natural language parsing with combinatory categorial grammar in a graph-unification-based formalism , 1986 .

[14]  Hans Uszkoreit Categorial Unification Grammars , 1986, COLING.

[15]  Mark Steedman,et al.  Combinatory grammars and parasitic gaps , 1987 .

[16]  Kent Wittenburg,et al.  Predictive Combinators: a Method for Efficient Processing of Combinatory Categorial Grammars , 1987, ACL.

[17]  Mark Steedman,et al.  A Lazy way to Chart-Parse with Categorial Grammars , 1987, ACL.

[18]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[19]  李幼升,et al.  Ph , 1989 .

[20]  Glyn Morrill,et al.  Parsing and Derivational Equivalence , 1989, EACL.

[21]  Kent Wittenburg,et al.  Zero Morphemes in Unification-Based Combinatory Categorial Grammar , 1990, ACL.

[22]  David J. Weir,et al.  Polynomial Time Parsing of Combinatory Categorial Grammars , 1990, ACL.

[23]  Geoffrey Nunberg,et al.  The linguistics of punctuation , 1990 .

[24]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[25]  Bob Carpenter,et al.  The Generative Power of Categorial Grammars and Head-Driven Phrase Structure Grammars with Lexical Rules , 1991, Comput. Linguistics.

[26]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[27]  Philip Resnik,et al.  Probabilistic Tree-Adjoining Grammar as a Framework for Statistical Natural Language Processing , 1992, COLING.

[28]  F. Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, ACL.

[29]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[30]  Yves Schabes,et al.  Stochastic Lexicalized Tree-adjoining Grammars , 1992, COLING.

[31]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[32]  David J. Weir,et al.  Parsing Some Constrained Grammar Formalisms , 1993, Comput. Linguistics.

[33]  Srinivas Bangalore,et al.  The Institute For Research In Cognitive Science Disambiguation of Super Parts of Speech ( or Supertags ) : Almost Parsing by Aravind , 1995 .

[34]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[35]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[36]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[37]  Srinivas Bangalore,et al.  Bootstrapping A Wide-Coverage CCG from FB-LTAG , 1994, ArXiv.

[38]  Bob Carpenter,et al.  Categorial Grammars, Lexical Rules and the English Predicative , 1995 .

[39]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[40]  Eugene Charniak,et al.  Figures of Merit for Best-First Probabilistic Chart Parsing , 1998, Comput. Linguistics.

[41]  Jason Eisner Efficient Normal-Form Parsing for Combinatory Categorial Grammar , 1996, ACL.

[42]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[43]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[44]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[45]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[46]  Joshua Goodman,et al.  Probabilistic Feature Grammars , 1997, IWPT.

[47]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[48]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.

[49]  Joshua Goodman,et al.  Global Thresholding and Multiple-Pass Parsing , 1997, EMNLP.

[50]  Ted Briscoe,et al.  Learning Stochastic Categorial Grammars , 1997, CoNLL.

[51]  Christine Doran,et al.  Developing a Wide-Coverage CCG System , 1997 .

[52]  Aravind K. Joshi,et al.  Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.

[53]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[54]  Mark Steedman,et al.  Surface structure and interpretation , 1996, Linguistic inquiry.

[55]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[56]  Mitchell P. Marcus,et al.  Adding Semantic Annotation to the Penn TreeBank , 1998 .

[57]  Rens Bod,et al.  Beyond Grammar: An Experience-Based Theory of Language , 1998 .

[58]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[59]  Zhiyi Chi,et al.  Estimation of Probabilistic Context-Free Grammars , 1998, Comput. Linguistics.

[60]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[61]  Anoop Sarkar,et al.  Conditions on Consistency of Probabilistic Tree Adjoining Grammars , 1998, ACL.

[62]  Rebecca Hwa Supervised Grammar Induction using Training Data with Limited Constituent Information , 1999, ACL.

[63]  Mark Johnson,et al.  Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[64]  Geoffrey E. Hinton Products of experts , 1999 .

[65]  Eric Brill,et al.  Exploiting Diversity in Natural Language Processing: Combining Parsers , 1999, EMNLP.

[66]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[67]  Ted Briscoe,et al.  Corpus Annotation for Parser Evaluation , 1999, ArXiv.

[68]  Eric Brill,et al.  Bagging and Boosting a Treebank Parser , 2000, ANLP.

[69]  David Chiang,et al.  Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar , 2000, ACL.

[70]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[71]  Ted Briscoe Grammatical acquisition: Inductive bias and coevolution of language and the language acquisition device , 2000 .

[72]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[73]  Owen Rambow,et al.  Tree adjoining grammars : formalisms, linguistic analysis, and processing , 2000 .

[74]  Eugene Charniak,et al.  Assigning Function Tags to Parsed Text , 2000, ANLP.

[75]  Fei Xia,et al.  A Uniform Method of Grammar Extraction and Its Applications , 2000, EMNLP.

[76]  K. Vijay-Shanker,et al.  Automated Extraction of TAGs from the Penn Treebank , 2000, IWPT.

[77]  Dan Klein,et al.  Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank , 2001, ACL.

[78]  Suresh Manandhar,et al.  Translating Treebank Annotation for Evaluation , 2001, ACL 2001.

[79]  Eugene Charniak,et al.  Immediate-Head Parsing for Language Models , 2001, ACL.

[80]  Mitchell P. Marcus,et al.  Smoothing a probablistic lexicon via syntactic transformations , 2001 .

[81]  Julia Hockenmaier,et al.  Statistical Parsing for CCG with Simple Generative Models , 2001, ACL.

[82]  Gann Bierner,et al.  Alternative Phrases Theoretical Analysis and Practical Application , 2001 .

[83]  Rens Bod What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy? , 2001, ACL.

[84]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[85]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[86]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[87]  Andy Way,et al.  Automatic annotation of the Penn-treebank with LFG f-structureinformation , 2002 .

[88]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[89]  Mark Steedman,et al.  Acquiring Compact Lexicalized Grammars from a Cleaner Treebank , 2002, LREC.

[90]  Mark Johnson The DOP Estimation Method Is Biased and Inconsistent , 2002, Computational Linguistics.

[91]  Mark Steedman,et al.  Building Deep Dependency Structures using a Wide-Coverage CCG Parser , 2002, ACL.

[92]  Alexandra Kinyon,et al.  Identifying Verb Arguments and their Syntactic Function in the Penn Treebank , 2002, LREC.

[93]  Helmut Schmid Lexicalization of Probabilistic Grammars , 2002, COLING.

[94]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[95]  Aline Villavicencio The acquisition of a unification-based generalised categorial grammar , 2002 .

[96]  Jason Baldridge,et al.  Coupling CCG and Hybrid Logic Dependency Semantics , 2002, ACL.

[97]  Martha Palmer,et al.  Adding predicate argument structure to the Penn TreeBank , 2002 .

[98]  Mark Steedman,et al.  Generative Models for Statistical Parsing with Combinatory Categorial Grammar , 2002, ACL.

[99]  Jason Baldridge,et al.  Lexically specified derivational control in combinatory categorial grammar , 2002 .

[100]  Stephen Clark,et al.  Supertagging for Combinatory Categorial Grammar , 2002, TAG+.

[101]  Mary P. Harper,et al.  The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources , 2002, EMNLP.

[102]  Mark Johnson,et al.  A Simple Pattern-matching Algorithm for Recovering Empty Nodes and their Antecedents , 2002, ACL.

[103]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[104]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[105]  James R. Curran,et al.  Log-Linear Models for Wide-Coverage CCG Parsing , 2003, EMNLP.

[106]  Julia Hockenmaier Parsing with Generative Models of Predicate-Argument Structure , 2003, ACL.

[107]  Daniel Gildea,et al.  Identifying Semantic Roles Using Combinatory Categorial Grammar , 2003, EMNLP.

[108]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[109]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[110]  Julia Hockenmaier,et al.  Extending the Coverage of a CCG System , 2004 .

[111]  P. Kantor Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.

[112]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[113]  David J. Weir,et al.  The equivalence of four extensions of context-free grammars , 1994, Mathematical systems theory.

[114]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[115]  øöö Blockinø Providing Robustness for a CCG System , .