Automated extraction of Tree-Adjoining Grammars from treebanks

There has been a contemporary surge of interest in the application of stochastic models of parsing. The use of tree-adjoining grammar (TAG) in this domain has been relatively limited due in part to the unavailability, until recently, of large-scale corpora hand-annotated with TAG structures. Our goals are to develop inexpensive means of generating such corpora and to demonstrate their applicability to stochastic modeling. We present a method for automatically extracting a linguistically plausible TAG from the Penn Treebank. Furthermore, we also introduce labor-inexpensive methods for inducing higher-level organization of TAGs. Empirically, we perform an evaluation of various automatically extracted TAGs and also demonstrate how our induced higher-level organization of TAGs can be used for smoothing stochastic TAG models.

[1]  Fei Xia,et al.  Comparing Lexicalized Treebank Grammars Extracted from Chinese, Korean, and English Corpora , 2000, ACL 2000.

[2]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[3]  Owen Rambow,et al.  Use of Deep Linguistic Features for the Recognition and Labeling of Semantic Arguments , 2003, EMNLP.

[4]  David Chiang,et al.  Recovering Latent Information in Treebanks , 2002, COLING.

[5]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[6]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[7]  Richard M. Schwartz,et al.  Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[8]  Aravind K. Joshi,et al.  Coordination in Tree Adjoining Grammars: Formalization and Implementation , 1996, COLING.

[9]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[10]  Marilyn A. Walker,et al.  Towards Automatic Generation of Natural Language Generation Systems , 2002, COLING.

[11]  Fei Xia,et al.  A Uniform Method of Grammar Extraction and Its Applications , 2000, EMNLP.

[12]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[13]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[14]  Yves Schabes,et al.  Stochastic Lexicalized Tree-adjoining Grammars , 1992, COLING.

[15]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[16]  Rebecca Hwa An Empirical Evaluation of Probabilistic Lexicalized Tree Insertion Grammars , 1998, COLING-ACL.

[17]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[18]  Srinivas Bangalore,et al.  Reranking an n-gram supertagger , 2002, TAG+.

[19]  David J. Weir,et al.  D-Tree Grammars , 1995, ACL.

[20]  Vijay K. Shanker,et al.  Towards efficient statistical parsing using lexicalized grammatical information , 2002 .

[21]  Fei Xia,et al.  Consistent grammar development using partial-tree descriptions for Lexicalized Tree-Adjoining Grammars , 1998, TAG+.

[22]  David Chiang,et al.  Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar , 2000, ACL.

[23]  Paola Merlo,et al.  Automatic distinction of arguments and modifiers: the case of prepositional phrases , 2001, CoNLL.

[24]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[25]  Mark Steedman,et al.  Generative Models for Statistical Parsing with Combinatory Categorial Grammar , 2002, ACL.

[26]  Srinivas Bangalore,et al.  New Models for Improving Supertag Disambiguation , 1999, EACL.

[27]  Srinivas Bangalore,et al.  Performance Evaluation of Supertagging for Partial Parsing , 2000 .

[28]  Mark Steedman,et al.  Acquiring Compact Lexicalized Grammars from a Cleaner Treebank , 2002, LREC.

[29]  Aravind K. Joshi,et al.  Parsing Strategies with ‘Lexicalized’ Grammars: Application to Tree Adjoining Grammars , 1988, COLING.

[30]  Anne Abeillé,et al.  A Lexicalized Tree Adjoining Grammar for English , 1990 .

[31]  Philip Resnik,et al.  Probabilistic Tree-Adjoining Grammar as a Framework for Statistical Natural Language Processing , 1992, COLING.

[32]  Stuart M. Shieber,et al.  An Alternative Conception of Tree-Adjoining Derivation , 1992, ACL.

[33]  Rebecca Hwa Supervised Grammar Induction using Training Data with Limited Constituent Information , 1999, ACL.

[34]  Daniel Gildea,et al.  Identifying Semantic Roles Using Combinatory Categorial Grammar , 2003, EMNLP.

[35]  Martha Palmer,et al.  Adding predicate argument structure to the Penn TreeBank , 2002 .

[36]  S. Buchholz,et al.  Distinguishing complements from adjuncts using memory-based learning , 1998 .

[37]  Andrew Radford,et al.  Transformational Grammar: A First Course , 1988 .

[38]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[39]  Srinivas Bangalore,et al.  Bootstrapping A Wide-Coverage CCG from FB-LTAG , 1994, ArXiv.

[40]  Carlo Cecchetto,et al.  Introduction to Government and Binding Theory , 1996 .

[41]  Stephen Clark,et al.  Supertagging for Combinatory Categorial Grammar , 2002, TAG+.

[42]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[43]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[44]  Richard C. Waters,et al.  Lexicalized Context-Free Grammars , 1993, ACL.

[45]  Srinivas Bangalore,et al.  Impact of Quality and Quantity of Corpora on Stochastic Generation , 2001, EMNLP.

[46]  Anoop Sarkar Practical experiments in parsing using Tree Adjoining Grammars , 2000, TAG+.

[47]  Anoop Sarkar,et al.  Applying Co-Training Methods to Statistical Parsing , 2001, NAACL.

[48]  Robert Frank,et al.  Phrase Structure Composition and Syntactic Dependencies , 2002, Computational Linguistics.

[49]  Günter Neumann Automatic extraction of stochastic lexicalized tree grammars from treebanks , 1998, TAG+.