The Groningen Meaning Bank

The goal of the Groningen Meaning Bank (GMB) is to obtain a large corpus of English texts annotated with formal meaning representations. Since manually annotating a comprehensive corpus with deep semantic representations is a hard and time-consuming task, we employ a sophisticated bootstrapping approach. This method employs existing language technology tools (for segmentation, part-of-speech tagging, named entity tagging, animacy labelling, syntactic parsing, and semantic processing) to get a reasonable approximation of the target annotations as a starting point. The machine-generated annotations are then refined by information obtained from both expert linguists (using a wiki-like platform) and crowd-sourcing methods (in the form of a ‘Game with a Purpose’) which help us in deciding how to resolve syntactic and semantic ambiguities. The result is a semantic resource that integrates various linguistic phenomena, including predicate-argument structure, scope, tense, thematic roles, rhetorical relations and presuppositions. The semantic formalism that brings all levels of annotation together in one meaning representation is Discourse Representation Theory, which supports meaning representations that can be translated to first-order logic. In contrast to ordinary treebanks, the units of annotation in the GMB are texts, rather than isolated sentences. The current version of the GMB contains more than 10,000 public domain texts aligned with Discourse Representation Structures, and is freely available for research purposes.

[1]  James Pustejovsky,et al.  Natural Language Annotation for Machine Learning , 2012 .

[2]  James R. Curran,et al.  Language Independent NER using a Maximum Entropy Tagger , 2003, CoNLL.

[3]  Rob A. van der Sandt,et al.  Presupposition Projection as Anaphora Resolution , 1992, J. Semant..

[4]  Iryna Gurevych,et al.  UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF , 2012, EACL.

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Johannes Bjerva Multi-class Animacy Classification with Semantic Features , 2014, EACL.

[7]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[8]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[9]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[10]  Jan van Eijck,et al.  Representing Discourse in Context , 1997, Handbook of Logic and Language.

[11]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[12]  Johan Bos,et al.  Elephant: Sequence Labeling for Word and Sentence Segmentation , 2013, EMNLP.

[13]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[14]  Christiane Fellbaum,et al.  The Manually Annotated Sub-Corpus: A Community Resource for and by the People , 2010, ACL.

[15]  Anatol Stefanowitsch,et al.  Constructional semantics as a limit to grammatical alternation: The two genitives of English , 2003 .

[16]  Johan Bos,et al.  Implementing the Binding and Accommodation Theory for Anaphora Resolution and Presupposition Projection , 2003, CL.

[17]  Heeyoung Lee,et al.  Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules , 2013, CL.

[18]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[19]  Phong Le,et al.  Learning Compositional Semantics for Open Domain Semantic Parsing , 2012, COLING.

[20]  Johan Bos,et al.  Gamification for Word Sense Labeling , 2013, IWCS.

[21]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[22]  Udo Kruschwitz,et al.  Addressing the Resource Bottleneck to Create Large-Scale Annotated Texts , 2008, STEP.

[23]  Johan Bos,et al.  Semantic Annotation Issues in Parallel Meaning Banking , 2014, ACL 2014.

[24]  R. J. Evans,et al.  NP Animacy Identification for Anaphora Resolution , 2007, J. Artif. Intell. Res..

[25]  Ralph Grishman,et al.  The NomBank Project: An Interim Report , 2004, FCP@NAACL-HLT.

[26]  Jirí Mírovský,et al.  Play the Language: Play Coreference , 2009, ACL.

[27]  Johan Bos,et al.  Computational Semantics in Discourse: Underspecification, Resolution, and Inference , 2004, J. Log. Lang. Inf..

[28]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[29]  Johan Bos,et al.  Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics , 2012 .

[30]  Mark Steedman,et al.  The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue , 2010, Lang. Resour. Evaluation.

[31]  Johan Bos,et al.  An Empirical Approach to the Semantic Representation of Laws , 2012, JURIX.

[32]  Johan Bos,et al.  Developing a large semantically annotated corpus , 2012, LREC.

[33]  Sophia Ananiadou,et al.  An Annotation Type System for a Data-Driven NLP Pipeline , 2007, LAW@ACL.

[34]  Malvina Nissim,et al.  Annotating semantic roles in a lexicalised grammar environment , 2012 .

[35]  Wolfgang Menzel,et al.  Large-scale CCG Induction from the Groningen Meaning Bank , 2014, ACL 2014.

[36]  Anders Søgaard,et al.  Patrick Blackburn and Johan Bos, Representation and Inference for Natural Language , 2007, Stud Logica.

[37]  Johan Bos,et al.  Predicate logic unplugged , 1996 .

[38]  Neville Ryant,et al.  A Large-scale Classication of English Verbs , 2006 .

[39]  Raymond J. Mooney,et al.  Learning for Semantic Parsing , 2009, CICLing.

[40]  H. Kamp A Theory of Truth and Semantic Representation , 2008 .

[41]  R. Harald Baayen,et al.  Predicting the dative alternation , 2007 .

[42]  C. Fellbaum An Electronic Lexical Database , 1998 .

[43]  Mathieu Lafourcade,et al.  Making people play for Lexical Acquisition with the JeuxDeMots prototype , 2007 .

[44]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[45]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[46]  I. I. N. Kamp Combining Montague Semantics and Discourse Representation , 1996 .

[47]  Christopher Potts The logic of conventional implicatures , 2004 .

[48]  Rashmi Prasad,et al.  The Penn Discourse TreeBank as a Resource for Natural Language Generation , 2005 .

[49]  Satoshi Sekine,et al.  Extended Named Entity Hierarchy , 2002, LREC.

[50]  James R. Curran,et al.  Parsing the WSJ Using CCG and Log-Linear Models , 2004, ACL.

[51]  Stefan Evert,et al.  The NITE XML Toolkit: Flexible annotation for multimodal language data , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[52]  Johan Bos,et al.  Linguistically Motivated Large-Scale NLP with C&C and Boxer , 2007, ACL.

[53]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[54]  Johan Bos,et al.  Aligning Formal Meaning Representations with Surface Strings for Wide-Coverage Text Generation , 2013, ENLG.

[55]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[56]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[57]  Jean Carletta,et al.  Animacy Encoding in English: Why and How , 2004, ACL 2004.

[58]  Michael Kohlhase,et al.  Inference and Computational Semantics , 2004, J. Log. Lang. Inf..

[59]  Nicholas Asher,et al.  Reference to abstract objects in discourse , 1993, Studies in linguistics and philosophy.

[60]  Johan Bos,et al.  How and why conventional implicatures project , 2014 .

[61]  Vito Pirrelli,et al.  Climbing the Path to Grammar: A Maximum Entropy Model of Subject/Object Learning , 2005, ACL 2005.

[62]  Johan Bos,et al.  Scope Disambiguation as a Tagging Task , 2013, IWCS.

[63]  Anette Rosenbach,et al.  Animacy and grammatical variation—Findings from English genitive variation , 2008 .

[64]  Johan Bos,et al.  Parsimonious Semantic Representations with Projection Pointers , 2013, IWCS.

[65]  Valentin Tablan,et al.  Web-assisted annotation, semantic indexing and search of television and radio news , 2005, WWW '05.

[66]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.