The Proposition Bank: An Annotated Corpus of Semantic Roles

The Proposition Bank project takes a practical approach to semantic representation, adding a layer of predicate-argument information, or semantic role labels, to the syntactic structures of the Penn Treebank. The resulting resource can be thought of as shallow, in that it does not represent coreference, quantification, and many other higher-order phenomena, but also broad, in that it covers every instance of every verb in the corpus and allows representative statistics to be calculated. We discuss the criteria used to define the sets of semantic roles used in the annotation process and to analyze the frequency of syntactic/semantic alternations in the corpus. We describe an automatic system for semantic role tagging trained on the corpus and discuss the effect on its performance of various types of information, including a comparison of full syntactic parsing with a flat representation and the contribution of the empty trace categories of the treebank.

[1]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[2]  C. Fillmore FRAME SEMANTICS AND THE NATURE OF LANGUAGE * , 1976 .

[3]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[4]  David R. Dowty Thematic proto-roles and argument selection , 1991 .

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[7]  Christopher D. Manning Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[8]  Michael R. Brent,et al.  From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax , 1993, Comput. Linguistics.

[9]  H. Alshawi,et al.  The Core Language Engine , 1994 .

[10]  Douglas A. Jones,et al.  Acquisition of Semantic Lexicons: Using Word Sense Disambiguation to Improve Precision , 1996 .

[11]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[12]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[13]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[14]  Douglas E. Appelt,et al.  FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text , 1997, ArXiv.

[15]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[16]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[17]  Martha Palmer,et al.  Investigating Regular Sense Extensions Based on Intersective Levin Classes , 1998, COLING-ACL.

[18]  Richard M. Schwartz,et al.  Algorithms that Learn to Extract Information BBN: TIPSTER Phase III , 1998, TIPSTER.

[19]  Richard M. Schwartz,et al.  BBN: Description of the SIFT System as Used for MUC-7 , 1998, MUC.

[20]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[21]  Martha Palmer,et al.  Class-Based Construction of a Verb Lexicon , 2000, AAAI/IAAI.

[22]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[23]  Sabine Schulte im Walde Clustering Verbs Semantically According to their Alternation Behaviour , 2000, COLING.

[24]  Diana McCarthy,et al.  Using Semantic Preferences to Identify Verbal Participation in Role Switching Alternations , 2000, ANLP.

[25]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[26]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[27]  Suzanne Stevenson,et al.  Automatic Verb Classification Based on Statistical Distributions of Argument Structure , 2001, CL.

[28]  Mark Craven,et al.  Representing Sentence Structure in Hidden Markov Models for Information Extraction , 2001, IJCAI.

[29]  Collin F. Baker,et al.  Frame semantics for text understanding , 2001 .

[30]  Martha Palmer,et al.  Automatic Predicate Argument Analysis of the Penn TreeBank , 2001, HLT.

[31]  Chris Brew,et al.  Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information , 2002, ACL.

[32]  Daniel Gildea,et al.  The Necessity of Parsing for Predicate Argument Recognition , 2002, ACL.

[33]  Mark Steedman,et al.  Generative Models for Statistical Parsing with Combinatory Categorial Grammar , 2002, ACL.

[34]  Mark Johnson,et al.  A Simple Pattern-matching Algorithm for Recovering Empty Nodes and their Antecedents , 2002, ACL.

[35]  Eva Hajicová,et al.  Argument/Valency Structure in PropBank, LCS Database and Prague Dependency Treebank: A Comparative Pilot Study , 2002, LREC.

[36]  Sanda M. Harabagiu,et al.  Using Predicate-Argument Structures for Information Extraction , 2003, ACL.

[37]  Owen Rambow,et al.  Automatically Deriving Tectogrammatical Labels from Other Resources: A Comparison of Semantic Labels Across Frameworks , 2003, Prague Bull. Math. Linguistics.

[38]  Daniel Gildea,et al.  Identifying Semantic Roles Using Combinatory Categorial Grammar , 2003, EMNLP.

[39]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[40]  Josef Ruppenhofer,et al.  FrameNet: Theory and Practice , 2003 .

[41]  Amit Dubey,et al.  Antecedent Recovery: Experiments with a Trace Tagger , 2003, EMNLP.

[42]  Yuval Krymolowski,et al.  Clustering Polysemic Subcategorization Frame Distributions Semantically , 2003, ACL.

[43]  Owen Rambow,et al.  Use of Deep Linguistic Features for the Recognition and Labeling of Semantic Arguments , 2003, EMNLP.

[44]  Nianwen Xue,et al.  Annotating the Propositions in the Penn Chinese Treebank , 2003, SIGHAN.

[45]  Olga Babko-Malaya,et al.  Different Sense Granularities for Different Applications , 2004, HLT-NAACL 2004.

[46]  Nianwen Xue,et al.  Calibrating Features for Semantic Role Labeling , 2004, EMNLP.

[47]  Ted Briscoe,et al.  Extended Lexical-Semantic Classification of English Verbs , 2004, HLT-NAACL 2004.

[48]  Daniel Gildea,et al.  Semantic Labeling by Maximum Entropy Model , 2004 .

[49]  Seth Kulick,et al.  Proposition Bank II: Delving Deeper , 2004, FCP@NAACL-HLT.

[50]  Xavier Carreras,et al.  Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling , 2004, CoNLL.

[51]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[52]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[53]  Paola Merlo,et al.  The Notion of Argument in Prepositional Phrase Attachment , 2006, Computational Linguistics.

[54]  Sabine Schulte im Walde Experiments on the Automatic Induction of German Semantic Verb Classes , 2006, CL.