Treebank-based acquisition of wide-coverage, probabilistic LFGresources: project overview, results and evaluation

This paper presents an overview of a project to acquire wide-coverage, probabilistic Lexical-Functional Grammar (LFG) resources from treebanks. Our approach is based on an automatic annotation algorithm that annotates “raw” treebank trees with LFG f-structure information approximating to basic predicate-argument/dependency structure. From the f-structure-annotated treebank we extract probabilistic unification grammar resources. We present the annotation algorithm, the extraction of lexical information and the acquisition of wide-coverage and robust PCFG-based LFG approximations including long-distance dependency resolution. We show how the methodology can be applied to multilingual, treebank-based unification grammar acquisition. Finally we show how simple (quasi-)logical forms can be derived automatically from the f-structures generated for the treebank trees.

[1]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[2]  Mats Rooth,et al.  Valence Induction with a Head-Lexicalized PCFG , 1998, EMNLP.

[3]  Tomoko Ohkuma,et al.  Multilingual Grammar Development via Grammar Porting , 2003 .

[4]  Mary Dalrymple,et al.  Lexical Functional Grammar , 2001 .

[5]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[6]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[7]  Ralph Grishman,et al.  Comlex Syntax: Building a Computational Lexicon , 1994, COLING.

[8]  Andy Way,et al.  Deriving Quasi-Logical Forms From F-Structures For The Penn Treebank , 2003 .

[9]  Andy Way,et al.  Evaluating Automatic LFG F-Structure Annotation for the Penn-II Treebank , 2004 .

[10]  Miriam Butt,et al.  The Parallel Grammar Project , 2002, COLING 2002.

[11]  Julia Hockenmaier,et al.  Extending the Coverage of a CCG System , 2004 .

[12]  Mark Steedman,et al.  Building Deep Dependency Structures using a Wide-Coverage CCG Parser , 2002, ACL.

[13]  Julia Hockenmaier Parsing with Generative Models of Predicate-Argument Structure , 2003, ACL.

[14]  Josef van Genabith,et al.  Direct and Underspecified Interpretations of LFG f-structures , 1996, COLING.

[15]  Ralph Grishman,et al.  The Comlex Syntax Project: The First Year , 1994, HLT.

[16]  Andy Way,et al.  Parsing with PCFGs and automatic f-structure annotation , 2002 .

[17]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[18]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[19]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[20]  J. Bresnan Lexical-Functional Syntax , 2000 .

[21]  Sabine Schulte im Walde Evaluating Verb Subcategorisation Frames learned by a German Statistical Grammar against Manual Defi , 2002 .

[22]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[23]  Hiyan Alshawi,et al.  Monotonic Semantic Interpretation , 1992, ACL.

[24]  Ronald M. Kaplan,et al.  Lexical Functional Grammar A Formal System for Grammatical Representation , 2004 .

[25]  Mark Johnson,et al.  A Simple Pattern-matching Algorithm for Recovering Empty Nodes and their Antecedents , 2002, ACL.

[26]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.

[27]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.