In this paper we describe the Semantic Quran dataset, a multilingual RDF representation of translations of the Quran. The dataset was created by integrating data from two different semi-structured sources and aligned to an ontology designed to represent multilingual data from sources with a hierarchical structure. The resulting RDF data encompasses 43 different languages which belong to the most under-represented languages in the Linked Data Cloud, including Arabic, Amharic and Amazigh. We designed the dataset to be easily usable in natural-language processing applications with the goal of facilitating the development of knowledge extraction tools for these languages. In particular, the Semantic Quran is compatible with the Natural-Language Interchange Format and contains explicit morpho-syntactic information on the utilized terms. We present the ontology devised for structuring the data. We also provide the transformation rules implemented in our extraction framework. Finally, we detail the link creation process as well as possible usage scenarios for the Semantic Quran dataset.
[1]
Leslie F. Sikos,et al.
The Semantic Gap
,
2017
.
[2]
Jens Lehmann,et al.
Template-based question answering over RDF data
,
2012,
WWW.
[3]
Axel-Cyrille Ngonga Ngomo,et al.
On Link Discovery using a Hybrid Approach
,
2012,
Journal on Data Semantics.
[4]
Jens Lehmann,et al.
Introduction to Linked Data and Its Lifecycle on the Web
,
2013,
Reasoning Web.
[5]
Sebastian Hellmann.
The Semantic Gap of Formalized Meaning
,
2010,
ESWC.
[6]
Jens Lehmann,et al.
Linked-Data Aware URI Schemes for Referencing Text Fragments
,
2012,
EKAW.
[7]
Scott Farrar,et al.
A linguistic ontology for the semantic web
,
2003
.