An Efficient LSI based Information Retrieval Framework using Particle swarm optimization and simulated annealing approach

The number of users and the amount of information available has exploded since the advent of the World Wide Web (WWW). Most of Web users use various search engines to get specific information. A key factor in the success of Web search engines are their ability to rapidly find good quality results to the queries that are based on specific terms. This paper aims at retrieving more relevant documents from a huge corpus based on the required information. We propose a text mining framework that consists of four distinct stages: 1. Text preprocessing 2. Dimensionality reduction using latent semantic indexing 3. Clustering based on hybrid combination of particle swarm optimization (PSO) and k-means algorithm 4. Information retrieval process using simulated annealing (SA). This framework provides more relevant documents to the user and reduces the irrelevant documents.

[1]  Wang Ying LATENT SEMANTIC INDEXING BASED ON VSM , 2010 .

[2]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[3]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[6]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[7]  Xiaohui Cui,et al.  Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm , 2005 .

[8]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Xiaohui Cui,et al.  Clustering Analysis Based on Hybrid PSO + K-means Algorithm , 2005 .

[10]  Malcolm I. Heywood A Comparative Study of Dimension Reduction Techniques for Document Clustering , .

[11]  Erkki Oja,et al.  Entropy-based measures for clustering and SOM topology preservation applied to content-based image indexing and retrieval , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[12]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[13]  R. E. Cunningham,et al.  Analysis of Applications , 1980 .

[14]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.