Linguistic Linked Open Data (LLOD). Introduction and Overview

The explosion of information technology has led to a substantial growth in quantity, diversity and complexity of linguistic data accessible over the internet. The lack of interoperability between linguistic and language resources represents a major challenge that needs to be addressed, in particular, if information from different sources is to be combined, like, say, machine-readable lexicons, corpus data and terminology repositories. For these types of resources, domainspecific standards have been proposed, yet, issues of interoperability between different types of resources persist, commonly accepted strategies to distribute, access and integrate their information have yet to be established, and technologies and infrastructures to address both aspects are still under development. The goal of the 2nd Workshop on Linked Data in Linguistics (LDL-2013) has been to bring together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections, including corpora, dictionaries, lexical networks, translation memories, thesauri, etc., infrastructures developed on that basis, their use of existing standards, and the publication and distribution policies that were adopted. Background: Integrating Information from Different Sources In recent years, the limited interoperability between linguistic resources has been recognized as a major obstacle for data use and re-use within and across discipline boundaries. After half a century of computational linguistics [8], quantitative typology [12], empirical, corpus-based study of language [10], and computational lexicography [16], researchers in computational linguistics, natural language processing (NLP) or information technology, as well as in Digital Humanities, are confronted with an immense wealth of linguistic resources, that are not only growing in number, but also in their heterogeneity. Interoperability involves two aspects [14]: Structural (‘syntactic’) interoperability: Resources use comparable formalisms to represent and to access data (formats, protocols, query languages, etc.),

[1]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[2]  Nancy Ide,et al.  What Does Interoperability Mean , Anyway ? Toward an Operational Definition of Interoperability for Language Technology , 2010 .

[3]  Philipp Frischmuth,et al.  Weaving a Distributed, Semantic Social Network for Mobile Users , 2011, ESWC.

[4]  Aldo Gangemi,et al.  The OntoWordNet Project: Extension and Axiomatization of Conceptual Relations in WordNet , 2003, OTM.

[5]  W. J. Hutchins,et al.  The Georgetown-IBM experiment demonstrated in January 1954 , 2004, AMTA.

[6]  Nancy Ide,et al.  GrAF: A Graph-based Format for Linguistic Annotations , 2007, LAW@ACL.

[7]  Christian Chiarcos Ontologies of Linguistic Annotation: Survey and perspectives , 2012, LREC.

[8]  Christian Chiarcos,et al.  Interoperability of Corpora and Annotations , 2012, Linked Data in Linguistics.

[9]  Sue Ellen Wright A Global Data Category Registry for Interoperable Language Resources , 2004, LREC.

[10]  Jens Lehmann,et al.  Linked-Data Aware URI Schemes for Referencing Text Fragments , 2012, EKAW.

[11]  Christiane Fellbaum,et al.  Towards Open Data for Linguistics: Linguistic Linked Data , 2013, New Trends of Research in Ontologies and Lexical Resources.

[12]  P. Davies The American heritage dictionary of the English language , 1981 .

[13]  J. Greenberg A Quantitative Approach to the Morphological Typology of Language , 1960, International Journal of American Linguistics.

[14]  Menzo Windhouwer,et al.  Linking to Linguistic Data Categories in ISOcat , 2012, Linked Data in Linguistics.

[15]  Christian Chiarcos,et al.  POWLA: Modeling Linguistic Corpora in OWL/DL , 2012, ESWC.

[16]  S. Farrar,et al.  Markup and the GOLD Ontology , 2003 .

[17]  Ulrich Heid,et al.  Formalising Multi-layer Corpora in OWL DL - Lexicon Modelling, Querying and Consistency Control , 2008, IJCNLP.

[18]  Mark Liberman,et al.  A formal framework for linguistic annotation , 1999, Speech Commun..