论文信息 - Comparative Study of Verse Similarity for Multi-lingual Representations of the Qur ’ an

Comparative Study of Verse Similarity for Multi-lingual Representations of the Qur ’ an

Text similarity is a subject that has received great attention in recent years. However, the application of text similarity tools to Semitic languages such as Arabic faces unique challenges. Moreover, the increasing number of texts being made available online, not only in native languages but also in translation, adds further challenge to identifying similar portions of texts across different documents. In this paper, we explore the problem of text similarity in the context of multi-lingual representations of the Qur’an. Particularly, we use Arabic and English datasets of the Qur’an for comparative study and analysis of several similarity measures applied across different representations of the verses in the Qur’an. We provide useful insights into the impact of using different similarity measures applied to different features across different representations and linguistic characteristics of similar text.

A. Basharat | D. Yazdansepas | K. Rasheed

[1] Martin Brümmer,et al. Semantic Quran , 2015, Semantic Web.

[2] Xiaolin Du,et al. Short Text Classification: A Survey , 2014, J. Multim..

[3] R. Mooney,et al. Impact of Similarity Measures on Web-page Clustering , 2000 .

[4] Serge Sharoff,et al. Document dissimilarity within and across languages: A benchmarking study , 2014, Lit. Linguistic Comput..

[5] Nizar Habash,et al. Understanding the Quran:a new grand challenge for computer science and artificial intelligence , 2010 .

[6] Abdul-Baquee M. Sharaf,et al. An artificial intelligence approach to Arabic and Islamic content on the internet , 2011 .

[7] Hanane Froud,et al. A comparative study of root-based and stem-based approaches for measuring the similarity between arabic words for arabic text mining applications , 2012 .

[8] Amr Kandil,et al. Automatic clustering of construction project documents based on textual similarity , 2014 .

[9] Mohammad S. Khorsheed,et al. Comparative evaluation of text classification techniques using a large diverse Arabic dataset , 2013, Language Resources and Evaluation.

[10] Eric Atwell,et al. QurSim: A corpus for evaluation of relatedness in short texts , 2012, LREC.

[11] Anna-Lan Huang,et al. Similarity Measures for Text Document Clustering , 2008 .

[12] Duc-Thuan Vo,et al. Learning to classify short text from scientific documents using topic models with various types of knowledge , 2015, Expert Syst. Appl..