A Parallel Latent Semantic Indexing (LSI) Algorithm for Malay Hadith Translated Document Retrieval

Latent Semantic Indexing (LSI) is one of the well-known searching techniques which match queries to documents in information retrieval applications. LSI has been proven to improve the retrieval performance, however, as the size of documents gets larger, current implementations are not fast enough to compute the result on a standard personal computer. In this paper, we proposed a new parallel LSI algorithm on standard personal computers with multi-core processors to improve the performance of retrieving relevant documents. The proposed parallel LSI was designed to automatically run the matrix computation on LSI algorithms as parallel threads using multi-core processors. The Fork-Join technique is applied to execute the parallel programs. We used the Malay Translated Hadith of Shahih Bukhari from Jilid 1 until Jilid 4 as the test collections. The total number of documents used is 2028 of text files. The processing time during the pre-processing phase of the documents for the proposed parallel LSI is measured and compared to the sequential LSI algorithm. Our results show that processing time for pre-processing tasks using our proposed parallel LSI system is faster than sequential system. Thus, our proposed parallel LSI algorithm has improved the searching time as compared to sequential LSI algorithm.

[1]  Niklas Wahlén A Comparison of Different Parallel Programming Models for Multicore Processors , 2010 .

[2]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of filtering methods , 1992, CHI '92.

[3]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[4]  R. E. Story,et al.  An Explanation of the Effectiveness of Latent Semantic Indexing by Means of a Bayesian Regression Model , 1996, Inf. Process. Manag..

[5]  David Austerberry Cataloging and Indexing , 2007 .

[6]  Ramlan Mahmod,et al.  Application of Latent Semantic Indexing on Malay-English Cross Language Information Retrieval , 2003, ICADL.

[7]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[8]  Nurazzah Abd Rahman,et al.  Efficient retrieval of Malay language documents using Latent Semantic Indexing , 2010, 2010 International Symposium on Information Technology.

[9]  Sanjeev Khudanpur,et al.  Lexical triggers and latent semantic analysis for cross-lingual language model adaptation , 2004, TALIP.

[10]  Thomas E. Potok,et al.  Parallel latent semantic analysis using a graphics processing unit , 2009, GECCO '09.

[11]  Masrah Azrifah Azmi Murad,et al.  Term Weighting Schemes Experiment Based on SVD for Malay Text Retrieval , 2008 .

[12]  Hector Garcia-Molina,et al.  Associate Editors , 2003, Molecular biology and evolution.

[13]  S. T. Dumais,et al.  Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[14]  William M. Pottenger,et al.  A framework for understanding Latent Semantic Indexing (LSI) performance , 2006, Inf. Process. Manag..