论文信息 - Annohub - Annotation Metadata for Linked Data Applications

Annohub - Annotation Metadata for Linked Data Applications

We introduce a new dataset for the Linguistic Linked Open Data (LLOD) cloud that will provide metadata about annotation and language information harvested from annotated language resources like corpora freely available on the internet. To our knowledge annotation metadata is not provided by any metadata provider, e.g. linghub, datahub or CLARIN so far. On the other hand, language metadata that is found on such portals is rarely provided in machine-readable form, especially as Linked Data. In this paper, we describe the harvesting process, content and structure of the new dataset and its application in the Lin|gu|is|tik portal, a research platform for linguists. Aside from that, we introduce tools for the conversion of XML encoded language resources to the CoNLL format. The generated RDF data as well as the XML-converter application are made public under an open license.

Luis Glaser | Christian Fäth | Frank Abromeit

[1] Christian Chiarcos,et al. CoNLL-RDF: Linked Corpora Done in an NLP-Friendly Way , 2017, LDK.

[2] Stefanie Dipper,et al. XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation , 2005, Berliner XML Tage.

[3] Christian Chiarcos,et al. Interoperability of Language-related Information: Mapping the BLL Thesaurus to Lexvo and Glottolog , 2018, LREC.

[4] Harald Hammarström,et al. Glottolog/Langdoc: Defining Dialects, Languages, and Language Families as Collections of Resources , 2011, LISC.

[5] Christian Chiarcos,et al. OLiA - Ontologies of Linguistic Annotation , 2015, Semantic Web.

[6] Mark Liberman,et al. A formal framework for linguistic annotation , 1999, Speech Commun..

[7] Gerard de Melo. Lexvo.org: Language-related information for the Linguistic Linked Data cloud , 2015, Semantic Web.

[8] Christian Chiarcos,et al. Lin|gu|is|tik: Building the Linguist's Pathway to Bibliographies, Libraries, Language Resources and Linked Open Data , 2016, LREC.

[9] Christian Chiarcos,et al. Automatic Detection of Language and Annotation Model Information in CoNLL Corpora , 2019, LDK.

[10] Nancy Ide,et al. GrAF: A Graph-based Format for Linguistic Annotations , 2007, LAW@ACL.