Effectiveness of query expansion in searching the Holy Quran

Modern Arabic text is written without diacritical marks (short vowels), which causes considerable ambiguity at the word level in the absence of context. Exceptional from this is the Holy Quran, which is endorsed with short vowels and other marks to preserve the pronunciation and hence, the correctness of sensing its words. Searching for a word in vowelized text requires typing and matching all its diacritical marks, which is cumbersome and preventing learners from searching and understanding the text. The other way around, is to ignore these marks and fall in the problem of ambiguity. In this paper, we provide a novel diacritic-less searching approach to retrieve from the Quran relevant verses that match a user’s query through automatic query expansion techniques. The proposed approach utilizes a relational database search engine that is scalable, portable across RDBMS platforms, and provides fast and sophisticated retrieval. The results are presented and the applied approach reveals future directions for search engines.

[1]  Xiaoqiang Luo,et al.  The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution , 2005, SEMITIC@ACL.

[2]  Dimitra Vergyri,et al.  Cross-dialectal data sharing for acoustic modeling in Arabic speech recognition , 2005, Speech Commun..

[3]  Fathi Debili,et al.  La langue arabe et l'ordinateur de l'étiquetage gramatical à la voyellation automatique , 2002 .

[4]  Martha W. Evens,et al.  Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System , 1999, J. Am. Soc. Inf. Sci..

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  Tomek Strzalkowski,et al.  Information Retrieval Using Robust Natural Language Processing , 1992, ACL.

[7]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[8]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[9]  Ruhi Sarikaya,et al.  Maximum Entropy Based Restoration of Arabic Diacritics , 2006, ACL.

[10]  Ophir Frieder,et al.  A parallel relational database management system approach to relevance feedback in information retrieval , 1999 .

[11]  Ying Wang,et al.  A study of the effect of term proximity on query expansion , 2006, J. Inf. Sci..

[12]  Ya'akov Gal An HMM Approach to Vowel Restoration in Arabic and Hebrew , 2002, SEMITIC@ACL.

[13]  BASSAM HAMMO,et al.  Experimenting with a Question Answering System for the Arabic Language , 2004, Comput. Humanit..

[14]  Ismail Hmeidi,et al.  Design and implementation of automatic indexing for information retrieval with Arabic documents , 1997 .