Comparative Study of Verse Similarity for Multi-lingual Representations of the Qur ’ an

Text similarity is a subject that has received great attention in recent years. However, the application of text similarity tools to Semitic languages such as Arabic faces unique challenges. Moreover, the increasing number of texts being made available online, not only in native languages but also in translation, adds further challenge to identifying similar portions of texts across different documents. In this paper, we explore the problem of text similarity in the context of multi-lingual representations of the Qur’an. Particularly, we use Arabic and English datasets of the Qur’an for comparative study and analysis of several similarity measures applied across different representations of the verses in the Qur’an. We provide useful insights into the impact of using different similarity measures applied to different features across different representations and linguistic characteristics of similar text.