From Language Documentation Data to LLOD: A Case Study in Turkic Lemon Dictionaries

In this paper, we describe the Lemon-OntoLex modeling of dictionaries created within language documentation efforts. We focus on exemplary resources for two less-resourced languages from the Turkic language family, Chalkan and Tuvan. Both datasets have been conveted into a Linked Data representation using the Lemon-OntoLex data model, with an extensible converter written in Python. We compare the conversion process for two both lexical resources, we analyze the difficulties we encountered during the conversion process and discuss the cases which caused the most common problems during the conversion. Furthermore, we evaluate the quality of converted dictionaries using specially designed SPARQL queries, and by manually checking random samples of the data. Finally, we describe the future application of this data within a lexicographic-comparative workbench, designed to facilitate language contact studies.