Parallel Dependency Treebank Annotated with Interlinked Verbal Synonym Classes and Roles

We present an ongoing project of enriching an annotation of a parallel dependency treebank, namely the Prague Czech-English Dependency Treebank, with verb-centered semantic annotation using a bilingual synonym verb class lexicon, CzEngClass. This lexicon, in turn, links the predicate occurrences in the corpus to various external lexicons, such as FrameNet, VerbNet, PropBank frame files, OntoNotes, and WordNet. We briefly describe the content of the CzEngClass synonym class lexicon and then we focus on its use for an enrichment of corpus annotation, which proceeds in two steps automatic preprocessing and manual correction. This paper describes a first milestone of a long-term project; so far, approx. 100 CzEngClass classes, containing about 1800 different verbs each for both Czech and English, are available for such annotation. The corpus coverage at the moment is about 50%, allowing us to extract some basic statistics and discover a set of issues that appeared during the annotation process. The ultimate goal is to have a high-coverage, multilingual verbal synonym lexicon and corpora with all events annotated by such lexicon, to serve both theoretical studies in lexical semantic, translatology, corpus annotation studies etc. as well as a usable resource for training automatic semantic text processing systems for event/participant detection and linking and for general information extraction.

[1]  Josef Ruppenhofer,et al.  FrameNet II: Extended theory and practice , 2006 .

[2]  Marie Mikulová,et al.  Announcing Prague Czech-English Dependency Treebank 2.0 , 2012, LREC.

[3]  C. Fillmore FRAME SEMANTICS AND THE NATURE OF LANGUAGE * , 1976 .

[4]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[5]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[6]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[8]  Silvie Cinková,et al.  EngVallex - English Valency Lexicon , 2014 .

[9]  Jan Hajic,et al.  Defining Verbal Synonyms: Between Syntax and Semantics , 2018 .

[10]  Silvie Cinková From PropBank to EngValLex: Adapting the PropBank-Lexicon to the Valency Theory of the Functional Generative Description , 2006, LREC.

[11]  Neville Ryant,et al.  Extending VerbNet with Novel Verb Classes , 2006, LREC.

[12]  Petr Sgall,et al.  The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects , 1986 .

[13]  Zdenka Uresová,et al.  CzEngVallex: a Bilingual Czech-English Valency Lexicon , 2016, Prague Bull. Math. Linguistics.

[14]  Eva Hajicová,et al.  Synonymy in Bilingual Context: The CzEngClass Lexicon , 2018, COLING.

[15]  Petr Pajas,et al.  PDT-VALLEX : Creating a Large-coverage Valency Lexicon for Treebank Annotation , 2003 .

[16]  Eva Hajicová,et al.  Creating a Verb Synonym Lexicon Based on a Parallel Corpus , 2018, LREC.

[17]  Roser Morante,et al.  SemEval-2010 Task 10: Linking Events and Their Participants in Discourse , 2009, SemEval@ACL.

[18]  Mitchell P. Marcus,et al.  OntoNotes: A Unified Relational Semantic Representation , 2007, International Conference on Semantic Computing (ICSC 2007).

[19]  Martha Palmer,et al.  Criteria for the Manual Grouping of Verb Senses , 2007, LAW@ACL.

[20]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.