Evaluating the Effectiveness of Thesaurus and Stemming Methods in Retrieving Malay Translated Al-Quran Documents

Information Technology has enabled information in many forms such as text, image or sound, to be accessed widely using search terms via a computer. Due to this type of popularity and advancement in technology, there is an increase interest in searching Malay text to enable scholars and researchers to access databases on-line. Malay texts are scanned are stored in databases ready to be used for text retrieval systems that employ conflation methods to identify word variants from these databases. This paper evaluates the retrieval effectiveness of conflation methods; namely stemming and thesaurus to search and retrieve relevant Malay translated Al-Quran documents based on user natural query words. The Malay Translated Al-Quran texts are stored in an inverted file structure. The retrieved documents are weighted and ranked using Inverse Document Frequency (idf) function. The retrieval effectiveness (E) is measured using standard recall (R) and precision (P). Experiments performed on the Malay Translated Al-Quran documents show that combined search of stemming and thesaurus improve retrieval effectiveness (E) and recall (R) but decrease its precision (P).

[1]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[2]  Gregor Thurmair,et al.  MARS: A Retrieval Tool on the Basis of Morphological Analysis , 1984, SIGIR.

[3]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[4]  Jacques Savoy Stemming of French words based on grammatical categories , 1993 .

[5]  Peter Willett,et al.  The effectiveness of stemming for natural‐language access to Slovene textual data , 1992 .

[6]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[7]  Steven L. Alter,et al.  Information Systems: A Management Perspective , 1991 .

[8]  Peter Willett,et al.  An evaluation of some conflation algorithms for information retrieval , 1981 .

[9]  John E. Ulmschneider,et al.  A practical stemming algorithm for online search assistance , 1983 .

[10]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[11]  Stephen Robertson,et al.  The methodology of information retrieval experiment , 1981 .

[12]  Padmini Srinivasan,et al.  Thesaurus Construction , 1992, Information Retrieval: Data Structures & Algorithms.

[13]  William B. Frakes,et al.  Introduction to Information Storage and Retrieval Systems , 1992, Information Retrieval: Data Structures & Algorithms.

[14]  William B. Frakes Term Conflation for Information Retrieval , 1984, SIGIR.

[15]  Martha W. Evens,et al.  Comparing words, stems, and roots as index terms in an Arabic Information Retrieval System , 1994 .

[16]  Stephen F. Weiss,et al.  Word segmentation by letter successor varieties , 1974, Inf. Storage Retr..

[17]  Donna Harman,et al.  How effective is suffixing , 1991 .