LIDIOMS: A Multilingual Linked Idioms Data Set

In this paper, we describe the LIDIOMS data set, a multilingual RDF representation of idioms currently containing five languages: English, German, Italian, Portuguese, and Russian. The data set is intended to support natural language processing applications by providing links between idioms across languages. The underlying data was crawled and integrated from various sources. To ensure the quality of the crawled data, all idioms were evaluated by at least two native speakers. Herein, we present the model devised for structuring the data. We also provide the details of linking LIDIOMS to well-known multilingual data sets such as BabelNet. The resulting data set complies with best practices according to Linguistic Linked Open Data Community.

[1]  Markus Nentwig,et al.  A survey of current Link Discovery frameworks , 2016, Semantic Web.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[4]  I. Sag,et al.  Idioms , 2015 .

[5]  Gilles Sérasset,et al.  DBnary: Wiktionary as a Lemon-based multilingual lexical resource in RDF , 2015, Semantic Web.

[6]  Julia Bosque-Gil,et al.  Applying the OntoLex Model to a Multilingual Terminological Resource , 2015, MSW@ESWC.

[7]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[8]  Axel-Cyrille Ngonga Ngomo,et al.  On Link Discovery using a Hybrid Approach , 2012, Journal on Data Semantics.

[9]  A G N,et al.  Bibliographical References , 1965 .

[10]  Asunción Gómez-Pérez,et al.  Interchanging lexical resources on the Semantic Web , 2012, Language Resources and Evaluation.

[11]  Daniel Vila-Suero,et al.  Enabling Language Resources to Expose Translations as Linked Data on the Web , 2014, LREC.

[12]  Martin Brümmer,et al.  Semantic Quran , 2015, Semantic Web.

[13]  Achim Rettinger,et al.  xLiD-Lexica: Cross-lingual Linked Data Lexica , 2014, LREC.

[14]  Philipp Cimiano,et al.  Representing Multilingual Data as Linked Data: the Case of BabelNet 2.0 , 2014, LREC.

[15]  Christian Chiarcos,et al.  The Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud , 2016, LREC.

[16]  Gerard de Melo Lexvo.org: Language-related information for the Linguistic Linked Data cloud , 2015, Semantic Web.

[17]  Paul Buitelaar,et al.  LexInfo: A declarative model for the lexicon-ontology interface , 2011, J. Web Semant..

[18]  Jeanine Akers,et al.  The Phrase Finder , 2013 .

[19]  J. I. Rodale The Phrase Finder , 1947 .

[20]  Christian Chiarcos,et al.  Linked Data in Linguistics , 2012, Springer Berlin Heidelberg.