Creating Linked Data Morphological Language Resources with MMoOn - The Hebrew Morpheme Inventory

The development of standard models for describing general lexical resources has led to the emergence of numerous lexical datasets of various languages in the Semantic Web. However, equivalent models covering the linguistic domain of morphology do not exist. As a result, there are hardly any language resources of morphemic data available in RDF to date. This paper presents the creation of the Hebrew Morpheme Inventory from a manually compiled tabular dataset comprising around 52.000 entries. It is an ongoing effort of representing the lexemes, word-forms and morphologigal patterns together with their underlying relations based on the newly created Multilingual Morpheme Ontology (MMoOn). It will be shown how segmented Hebrew language data can be granularly described in a Linked Data format, thus, serving as an exemplary case for creating morpheme inventories of any inflectional language with MMoOn. The resulting dataset is described a) according to the structure of the underlying data format, b) with respect to the Hebrew language characteristic of building word-forms directly from roots, c) by exemplifying how inflectional information is realized and d) with regard to its enrichment with external links to sense resources.

[1]  Gilles Sérasset Dbnary : Wiktionary as a Lemon Based RDF Multilingual Lexical Resource , 2012 .

[2]  Christian Chiarcos An ontology of linguistic annotations , 2008, LDV Forum.

[3]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[4]  Mathias Creutz,et al.  INDUCING THE MORPHOLOGICAL LEXICON OF A NATURAL LANGUAGE FROM UNANNOTATED TEXT , 2005 .

[5]  D. Terence Langendoen,et al.  An OWL-DL Implementation of Gold , 2010 .

[6]  Alon Itai,et al.  Language resources for Hebrew , 2008, Lang. Resour. Evaluation.

[7]  Benoît Sagot,et al.  The Lefff, a Freely Available and Large-coverage Morphological and Syntactic Lexicon for French , 2010, LREC.

[8]  Christian Simon,et al.  Morphisto - An Open Source Morphological Analyzer for German , 2009, FSMNLP.

[9]  Christian Chiarcos,et al.  Linked Data in Linguistics , 2012, Springer Berlin Heidelberg.

[10]  Mathias Creutz,et al.  Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[11]  J. J. Mc Carthy A Prosodic Theory of Nonconcatenative Morphology , 1981 .

[12]  Dorothee Beermann,et al.  TypeCraft collaborative databasing and resource sharing for linguists , 2012, ILD@ESWC.

[13]  Paul Buitelaar,et al.  LexInfo: A declarative model for the lexicon-ontology interface , 2011, J. Web Semant..

[14]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[15]  Philipp Cimiano,et al.  Linking Lexical Resources and Ontologies on the Semantic Web with Lemon , 2011, ESWC.

[16]  C. Lehmann Data in linguistics , 2004 .

[17]  Shuly Wintner,et al.  A Finite-State Morphological Grammar of Hebrew , 2005, ACL 2005.