Analysis of various information retrieval models

Information Retrieval is an automated way to extract the information from various sources. It has always been a key area of research due to its wide range of applications in document searching, software traceability etc. We have presented an analysis of the three basic models of information retrieval i.e. Vector Space Model, Latent Dirichlet Allocation Model and Latent Semantic Indexing Model in this paper. These models are explained in detail according to their basic concepts, methodology adopted and area of application. We have provided the advantages of these topic models over each other and have discussed their limitations too. We have also highlighted the basic categories of information. The variants of some basic models are also described according to their concept and usage. This kind of analysis will be useful for the user to make a choice between these information retrieval models as to find out which one is to be used regarding a particular problem or application.

[1]  Tao Tao,et al.  Diagnostic Evaluation of Information Retrieval Models , 2011, TOIS.

[2]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[3]  S. Dumais Latent Semantic Analysis. , 2005 .

[4]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[5]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[6]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[7]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[8]  Wilson Cheruiyot,et al.  A Survey of Information Retrieval Techniques , 2017 .

[9]  Vinay Chavan,et al.  A VECTOR SPACE MODEL FOR INFORMATION RETRIEVAL: A MATLAB APPROACH , 2012 .

[10]  Sanjay K. Dwivedi,et al.  A Comparative Study on Approaches of Vector Space Model in Information Retrieval , 2013, INFOCOM 2013.

[11]  Rajkumar Kannan,et al.  A Survey on Information Retrieval Models, Techniques and Applications , 2017 .

[12]  Andrea De Lucia,et al.  On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[13]  Letha H. Etzkorn,et al.  Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation , 2008, 2008 15th Working Conference on Reverse Engineering.

[14]  Giuliano Antoniol,et al.  Information retrieval models for recovering traceability links between code and documentation , 2000, Proceedings 2000 International Conference on Software Maintenance.

[15]  Manjeet Singh,et al.  Information Retrieval Modeling Techniques for Web Documents , 2011 .

[16]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[17]  Guy W. Mineau,et al.  Beyond TFIDF Weighting for Text Categorization in the Vector Space Model , 2005, IJCAI.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  ValletDavid,et al.  An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval , 2007 .

[20]  Christopher C. Yang Search Engines Information Retrieval in Practice , 2010, J. Assoc. Inf. Sci. Technol..

[21]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[22]  Dik Lun Lee,et al.  Document Ranking and the Vector-Space Model , 1997, IEEE Softw..

[23]  Lynn A. Streeter,et al.  Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval , 1989, Inf. Process. Manag..

[24]  Pablo Castells,et al.  An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval , 2007, IEEE Transactions on Knowledge and Data Engineering.

[25]  Youssef Bassil,et al.  Semantic-Sensitive Web Information Retrieval Model for HTML Documents , 2012, ArXiv.