论文信息 - Unity in Diversity: Integrating Differing Linguistic Data in TUSNELDA

Unity in Diversity: Integrating Differing Linguistic Data in TUSNELDA

ubingen This paper describes the creation and preparation of TUSNELDA, a collection of corpus data built for linguistic research. This collection contains a number of linguistically annotated corpora which differ in various aspects such as language, text sorts / data types, encoded annotation levels, and linguistic theories underlying the annotation. The paper focuses on this variation on the one hand and the way how these heterogeneous data are integrated into one resource on the other hand.

Andreas Wagner

[1] Andreas Wagner,et al. A Syntactically Annotated Corpus of Tibetan , 2004, LREC.

[2] Thomas C. Schmidt. Transcribing and annotating spoken language with EXMARaLDA , 2004 .

[3] Andreas Witt,et al. Concept-based Queries: Combining and Reusing Linguistic Corpus Formats and Query Languages , 2004, LREC.

[4] Atanas Kiryakov,et al. CLaRK - an XML-based System for Corpora Development 1 , 2001 .