A Malay Hadith translated document retrieval using parallel Latent Semantic Indexing (LSI)

Latent Semantic Indexing (LSI) is one of the well-known searching techniques where documents are retrieved based on the content similarity or meaning of the documents. LSI is an effective method to improve the retrieval performance, however, as the size of documents gets larger; a better technique is needed to process the documents faster. In this paper, a new parallel LSI algorithm which runs on standard multi-core personal computer (PC) is proposed to improve the performance of retrieving relevant documents. The parallel LSI algorithm uses parallel threads to automatically perform the matrix computations using the Fork-Join approach. 2028 text documents extracted from four volumes of the Malay-translated book of Hadith known as Shahih Bukhari were used as the test collections. We compare the time to process LSI space between both sequential and parallel systems. The percentage of recall, precision and effectiveness for retrieving relevant document are also measured for both systems using the Information Retrieval (IR) metrics which are recall, precision, and effectiveness. The results show that the time taken to create LSI space for parallel system is faster than sequential system. Based on recall, precision and effectiveness measures, our proposed parallel LSI system is comparable to sequential LSI system.

[1]  Ramlan Mahmod,et al.  Application of Latent Semantic Indexing on Malay-English Cross Language Information Retrieval , 2003, ICADL.

[2]  Niklas Wahlén A Comparison of Different Parallel Programming Models for Multicore Processors , 2010 .

[3]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of filtering methods , 1992, CHI '92.

[4]  Richard Sapon-White Cataloging and Indexing , 1999 .

[5]  Muhamad Taufik Abdullah,et al.  MALAY DOCUMENTS CLUSTERING ALGORITHM BASED ON SINGULAR VALUE DECOMPOSITION , 2009 .

[6]  Hector Garcia-Molina,et al.  Associate Editors , 2003, Molecular biology and evolution.

[7]  S. T. Dumais,et al.  Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[8]  R. E. Story,et al.  An Explanation of the Effectiveness of Latent Semantic Indexing by Means of a Bayesian Regression Model , 1996, Inf. Process. Manag..

[9]  Nurazzah Abd Rahman,et al.  Efficient retrieval of Malay language documents using Latent Semantic Indexing , 2010, 2010 International Symposium on Information Technology.

[10]  Thomas E. Potok,et al.  Parallel latent semantic analysis using a graphics processing unit , 2009, GECCO '09.

[11]  Nasiroh Omar,et al.  A Parallel Latent Semantic Indexing (LSI) Algorithm for Malay Hadith Translated Document Retrieval , 2015, SCDS.

[12]  Sanjeev Khudanpur,et al.  Lexical triggers and latent semantic analysis for cross-lingual language model adaptation , 2004, TALIP.

[13]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[14]  Hui Xiong,et al.  Adaptive label-driven scaling for latent semantic indexing , 2008, SIGIR '08.

[15]  David Austerberry Cataloging and Indexing , 2007 .

[16]  William M. Pottenger,et al.  A framework for understanding Latent Semantic Indexing (LSI) performance , 2006, Inf. Process. Manag..