Lemon-aid: using Lemon to aid quantitative historical linguistic analysis

In this short paper, we describe how we converted dictionary and wordlist data made available by the QuantHistLing project into the Lexicon Model for Ontologies. By doing so, we leverage Linked Data to combine disparate lexical resources – more than fifty lexicons and dictionaries – by converting the lexical data into an RDFmodel that is specified by Lemon. The resulting new Linked Data resource, what we call the QHL dataset, provides researchers with a translation graph, which allows users to query across the underlying lexicons and dictionaries to extract semantically-aligned wordlists.

[1]  William Martin,et al.  Networks uncover hidden lexical borrowing in Indo-European language evolution , 2010, Proceedings of the Royal Society B: Biological Sciences.

[2]  John Nerbonne,et al.  Multiple Sequence Alignments in Linguistics , 2009, LaTeCH - SHELT&R@EACL.

[3]  Steven Moran,et al.  An Open Source Toolkit for Quantitative Historical Linguistics , 2013, ACL.

[4]  Laurent Romary,et al.  Standardization of the formal representation of lexical information for NLP , 2009, ArXiv.

[5]  Nancy Ide,et al.  GrAF: A Graph-based Format for Linguistic Annotations , 2007, LAW@ACL.

[6]  Marc Kemps-Snijders,et al.  ISOcat: Corralling Data Categories in the Wild , 2008, LREC.

[7]  Uri Tadmor,et al.  Borrowability and the notion of basic vocabulary , 2012 .

[8]  Grzegorz Kondrak,et al.  A New Algorithm for the Alignment of Phonetic Sequences , 2000, ANLP.

[9]  Claudia Soria,et al.  Lexical Markup Framework (LMF) , 2006, LREC.

[10]  Harald Hammarström,et al.  Automated Dating of the World’s Language Families Based on Lexical Similarity , 2011, Current Anthropology.

[11]  Philipp Cimiano,et al.  Linking Lexical Resources and Ontologies on the Semantic Web with Lemon , 2011, ESWC.

[12]  Michael Cysouw,et al.  A Pipeline for Computational Historical Linguistics , 2011 .

[13]  Brett Kessler,et al.  Book Reviews: The Significance of Word Lists , 2001, CL.

[14]  Dan Klein,et al.  Automated reconstruction of ancient languages using probabilistic models of sound change , 2013, Proceedings of the National Academy of Sciences.

[15]  Peter Turchin Analyzing genetic connections between languages by matching consonant classes , 2010 .

[16]  Simon J. Greenhill,et al.  Mapping the Origins and Expansion of the Indo-European Language Family , 2012, Science.

[17]  Ilia Peiros,et al.  Analyzing genetic connections between languages by matching consonant classes 1 , 2010 .

[18]  M. Swadesh Lexico-Statistical Dating of Prehistoric Ethnic Contacts , 1952 .

[19]  Jens Lehmann,et al.  Linked-Data Aware URI Schemes for Referencing Text Fragments , 2012, EKAW.