Building lexical resources for PrincPar, a large coverage parser that generates principled semantic representations

Parsing, one of the more successful areas of Natural Language Processing, has mostly been concerned with syntactic structure. Though uncovering the syntactic structure of sentences is very important, in many applications a meaning representation for the input must be derived as well. We report on PrincPar, a parser that builds full meaning representations. It integrates LCFLEX, a robust parser, with a lexicon and ontology derived from two lexical resources, VerbNet and CoreLex, that represent the semantics of verbs and nouns respectively. We show that these two different lexical resources that focus on verbs and nouns can be successfully integrated. We report parsing results on a corpus of instructional text and assess the coverage of those lexical resources. Our evaluation metric is the number of verb frames that are assigned a correct semantics: 72.2% verb frames are assigned a perfect semantics, and another 10.9% are assigned a partially correct semantics. Our ultimate goal is to develop a (semi)automatic method to derive domain knowledge from instructional text, in the form of linguistically motivated action schemes.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[3]  Barbara Di Eugenio,et al.  Discourse Parsing: Learning FOL Rules based on Rich Verb Semantic Representations to automatically label Rhetorical Relations , 2006, Learning Structured Information@EACL.

[4]  Malka Rappaport Hovav,et al.  Wiping the slate clean: A lexical semantic exploration , 1991, Cognition.

[5]  Barbara Di Eugenio,et al.  Building lexical semantic representations for Natural Language instructions , 2003, HLT-NAACL.

[6]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[7]  Carolyn Penstein Rosé,et al.  BALANCING ROBUSTNESS AND EFFICIENCY IN UNIFICATION-AUGMENTED CONTEXT-FREE PARSERS FOR LARGE PRACTICAL APPLICATIONS , 2001 .

[8]  Lei Shi,et al.  Open Text Semantic Parsing Using FrameNet and WordNet , 2004, NAACL.

[9]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[10]  Charles J. Fillmore,et al.  The Structure of the Framenet Database , 2003 .

[11]  Martha Palmer,et al.  Class-Based Construction of a Verb Lexicon , 2000, AAAI/IAAI.

[12]  Ralph Grishman,et al.  Comlex Syntax: Building a Computational Lexicon , 1994, COLING.

[13]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[14]  James Pustejovsky,et al.  Corelex: systematic polysemy and underspecification , 1998 .

[15]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[16]  Neville Ryant,et al.  Assigning XTAG Trees to VerbNet , 2004, TAG+.

[17]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[18]  李幼升,et al.  Ph , 1989 .