论文信息 - Across Languages and Genres: Creating a Universal Annotation Scheme for Textual Relations

Across Languages and Genres: Creating a Universal Annotation Scheme for Textual Relations

The present paper describes an attempt to create an interoperable scheme using existing annotations of textual phenomena across languages and genres including non-canonical ones. Such a kind of analysis requires annotated multilingual resources which are costly. Therefore, we make use of annotations already available in the resources for English, German and Czech. As the annotations in these corpora are based on different conceptual and methodological backgrounds, we need an interoperable scheme that covers existing categories and at the same time allows a comparison of the resources. In this paper, we describe how this interoperable scheme was created and which problematic cases we had to consider. The resulting scheme is supposed to be applied in the future to explore contrasts between the three languages under analysis, for which we expect the greatest differences in the degree of variation between non-canonical and canonical language.

[1] Eva Hajicová,et al. Introducing the Prague Discourse Treebank 1.0 , 2013, IJCNLP.

[2] Christian Mair,et al. Twentieth-Century English: History, Variation and Standardization , 2006 .

[3] Michael Halliday,et al. Cohesion in English , 1976 .

[4] Herbert H. Clark,et al. Bridging , 1975, TINLAP.

[5] Marie Mikulová,et al. Semantic Representation of Ellipsis in the Prague Dependency Treebanks , 2014, ROCLING/IJCLCLP.

[6] Petr Sgall,et al. The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects , 1986 .

[7] Robert-Alain de Beaugrande,et al. Einfuhrung in die Textlinguistik , 1973 .

[8] Christoph Müller,et al. Multi-level annotation of linguistic data with MMAX 2 , 2006 .

[9] Petr Pajas,et al. Recent Advances in a Feature-Rich Framework for Treebank Annotation , 2008, COLING.

[10] Marie Mikulová,et al. Prague Dependency Treebank of Spoken Language (PDTSL) 0.5 , 2009 .

[11] Marie Mikulová,et al. Announcing Prague Czech-English Dependency Treebank 2.0 , 2012, LREC.

[12] Livio Robaldo,et al. The Penn Discourse TreeBank 2.0. , 2008, LREC.