Towards Open Data for Linguistics: Linguistic Linked Data

‘Open Data’ has become very important in a wide range of fields. However for linguistics, much data is still published in proprietary, closed formats and is not made available on the web. We propose the use of linked data principles to enable language resources to be published and interlinked openly on the web, and we describe the application of this paradigm to the modeling of two resources, WordNet and the MASC corpus. Here, WordNet and the MASC corpus serve as representative examples for two major classes of linguistic resources, lexical-semantic resources and annotated corpora, respectively.Furthermore, we argue that modeling and publishing language resources as linked data offers crucial advantages as compared to existing formalisms. In particular, it is explained how this can enhance the interoperability and the integration of linguistic resources. Further benefits of this approach include unambiguous identifiability of elements of linguistic description, the creation of dynamic, but unambiguous links between different resources, the possibility to query across distributed resources, and the availability of a mature technological infrastructure. Finally, recent community activities are described.

[1]  Christiane Fellbaum,et al.  WordNet and FrameNet as Complementary Resources for Annotation , 2009, Linguistic Annotation Workshop.

[2]  Philipp Cimiano,et al.  Linking Lexical Resources and Ontologies on the Semantic Web with Lemon , 2011, ESWC.

[3]  J. Goodwin,et al.  Geographical Linked Data: The Administrative Geography of Great Britain on the Semantic Web , 2008 .

[4]  Mark Liberman,et al.  A formal framework for linguistic annotation , 1999, Speech Commun..

[5]  Nancy Ide,et al.  A Registry of Standard Data Categories for Linguistic Annotation , 2004, LREC.

[6]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[7]  Roberto Tamassia,et al.  Handbook on Graph Drawing and Visualization , 2013 .

[8]  Simon Schenk,et al.  Sesame RDF Repository Extensions for Remote Querying , .

[9]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[10]  Nancy Ide,et al.  Outline of a Model for Lexical Databases , 1993, Inf. Process. Manag..

[11]  Philipp Cimiano,et al.  Collaborative semantic editing of linked data lexica , 2012, LREC.

[12]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[13]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[14]  Andrew H. Mutz,et al.  Transparent Content Negotiation in HTTP , 1998, RFC.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  Iryna Gurevych,et al.  UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF , 2012, EACL.

[17]  Stefanie Dipper,et al.  XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation , 2005, Berliner XML Tage.

[18]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.1 , 1997, RFC.

[19]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[20]  Christian Chiarcos,et al.  A Flexible Framework for Integrating Annotations from Different Tools and Tagsets , 2008 .

[21]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[22]  Nicola Guarino,et al.  Sweetening WORDNET with DOLCE , 2003, AI Mag..

[23]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[24]  Christian Chiarcos,et al.  Linked Data in Linguistics , 2012, Springer Berlin Heidelberg.

[25]  Marianne Raynaud,et al.  Tim Berners-Lee on the next Web of open, linked data , 2009 .

[26]  Luciano Serafini,et al.  Using Background Knowledge to Support Coreference Resolution , 2010, ECAI.

[27]  D. Terence Langendoen,et al.  An OWL-DL Implementation of Gold An Ontology for the Semantic Web , 2010 .

[28]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[29]  Nancy Ide,et al.  GrAF: A Graph-based Format for Linguistic Annotations , 2007, LAW@ACL.

[30]  D. Terence Langendoen,et al.  An OWL-DL Implementation of Gold , 2010 .

[31]  Paul T. Groth,et al.  TripleCloud: An Infrastructure for Exploratory Querying over Web-Scale RDF Data , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[32]  Christian Chiarcos,et al.  By all these lovely tokens... Merging conflicting tokenizations , 2009, Lang. Resour. Evaluation.

[33]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[34]  Ulrik Brandes,et al.  Graph Markup Language (GraphML) , 2013, Handbook of Graph Drawing and Visualization.

[35]  Óscar Corcho,et al.  Semantics and Optimization of the SPARQL 1.1 Federation Extension , 2011, ESWC.

[36]  Menzo Windhouwer,et al.  Linking to Linguistic Data Categories in ISOcat , 2012, Linked Data in Linguistics.

[37]  Claudia Soria,et al.  Lexical Markup Framework (LMF) , 2006, LREC.

[38]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[39]  Christian Chiarcos,et al.  Interoperability of Corpora and Annotations , 2012, Linked Data in Linguistics.

[40]  Aldo Gangemi,et al.  Conversion of WordNet to a standard RDF/OWL representation , 2006, LREC.

[41]  Nancy Ide,et al.  A Feature-Based Model for Lexical Databases , 1992, COLING.

[42]  Christian Chiarcos,et al.  The Open Linguistics Working Group , 2012, LREC.

[43]  Roy Fielding RFC 2068 : Hypertext Transfer Protocol-HTTP/1.1 , 1997 .

[44]  Jean Carletta,et al.  The NITE XML Toolkit: Data Model and Query Language , 2005, Lang. Resour. Evaluation.

[45]  Steve Cassidy An RDF realisation of LAF in the DADA annotation server , 2010, ACL 2010.

[46]  Nancy Ide,et al.  What Does Interoperability Mean , Anyway ? Toward an Operational Definition of Interoperability for Language Technology , 2010 .

[47]  Christiane Fellbaum,et al.  The Manually Annotated Sub-Corpus: A Community Resource for and by the People , 2010, ACL.

[48]  Scott Farrar,et al.  A linguistic ontology for the semantic web , 2003 .

[49]  Claudia Soria,et al.  Multilingual resources for NLP in the lexical markup framework (LMF) , 2008, Lang. Resour. Evaluation.

[50]  Christian Chiarcos An ontology of linguistic annotations , 2008, LDV Forum.

[51]  Philipp Cimiano,et al.  Integrating WordNet and Wiktionary with lemon , 2012, Linked Data in Linguistics.

[52]  Asunción Gómez-Pérez,et al.  Interchanging lexical resources on the Semantic Web , 2012, Language Resources and Evaluation.

[53]  Andreas Witt,et al.  Linguistic Modeling of Information and Markup Languages , 2010 .