Comparing Czech and English AMRs

This paper describes in detail the differences between Czech and English annotation using the Abstract Meaning Representation scheme, which stresses the use of ontologies (and semantically-oriented verbal lexicons) and relations based on meaning or ontological content rather than semantics or syntax. The basic “slogan” of the AMR specification clearly states that AMR is not an interlingua, yet it is expected that many relations as well as structures constructed from these relations will be similar or even identical across languages. In our study, we have investigated 100 sentences in English and their translations into Czech, annotated manually by AMRs, with the goal to describe the differences and if possible, to classify them into two main categories: those which are merely convention differences and thus can be unified by changing such conventions in the AMR annotation guidelines, and those which are so deeply rooted in the language structure that the level of abstraction which is inherent in the current AMR scheme does not allow for such unification.

[1]  Petr Pajas,et al.  Diatheses in the Czech Valency Lexicon PDT-Vallex , 2009 .

[2]  Nianwen Xue,et al.  Calibrating Features for Semantic Role Labeling , 2004, EMNLP.

[3]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[4]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[5]  Nianwen Xue,et al.  Not an Interlingua, But Close: Comparison of English AMRs to Chinese and Czech , 2014, LREC.

[6]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[7]  Ralph Grishman,et al.  The NomBank Project: An Interim Report , 2004, FCP@NAACL-HLT.

[8]  Petr Pajas,et al.  PDT-VALLEX : Creating a Large-coverage Valency Lexicon for Treebank Annotation , 2003 .

[9]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[10]  Zdenka Uresová Building the PDT-Vallex valency lexicon , 2011 .

[11]  Joakim Nivre,et al.  Joint Morphological and Syntactic Analysis for Richly Inflected Languages , 2013, TACL.

[12]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[13]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[14]  Dan Roth,et al.  Modeling Semantic Relations Expressed by Prepositions , 2013, TACL.

[15]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[16]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[17]  Zdenka Uresová,et al.  Resources in Conflict: A Bilingual Valency Lexicon vs. a Bilingual Treebank vs. a Linguistic Theory , 2014, LREC.

[18]  Daniel Jurafsky,et al.  Shallow Semantic Parsing using Support Vector Machines , 2004, NAACL.

[19]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[20]  James Pustejovsky,et al.  FactBank: a corpus annotated with event factuality , 2009, Lang. Resour. Evaluation.

[21]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.