Information Retrieval and Web Search

Information retrieval is the process of searching within a document collection for information most relevant to a user’s query. However, the type of document collection significantly affects the methods and algorithms used to process queries. In this chapter we distinguish between two types of document collections: traditional and Web collections. Traditional information retrieval is search within small, controlled, nonlinked collections (e.g., a collection of medical or legal documents), whereas Web information retrieval is search within the world’s largest and linked document collection. In spite of the proliferation of the Web, more traditional nonlinked collections still exist, and there is still a place for the older methods of information retrieval.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[3]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  Michael W. Berry,et al.  Understanding search engines: mathematical modeling and text retrieval (software , 1999 .

[7]  Gene H. Golub,et al.  Matrix computations , 1983 .

[8]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[9]  P. Paatero Least squares formulation of robust non-negative factor analysis , 1997 .

[10]  Michael I. Jordan,et al.  Link Analysis, Eigenvectors and Stability , 2001, IJCAI.

[11]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[12]  Fan Jiang,et al.  Approximate Dimension Equalization in Vector-based Information Retrieval , 2000, ICML.

[13]  R. D. Fierro,et al.  Low-Rank Orthogonal Decompositions for Information Retrieval Applications , 1995 .

[14]  Sepandar D. Kamvar,et al.  An Analytical Comparison of Approaches to Personalizing PageRank , 2003 .

[15]  M. Berry,et al.  Solving total least-squares problems in information retrieval , 2000 .

[16]  Amy Nicole Langville,et al.  A Reordering for the PageRank Problem , 2005, SIAM J. Sci. Comput..

[17]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[18]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[19]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[20]  Michael W. Berry,et al.  Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[21]  Michael W. Berry,et al.  Computational information retrieval , 2001 .

[22]  Katarina Blom Information retrieval using the singular value decomposition and Krylov subspaces , 1999 .

[23]  Axel Ruhe,et al.  Information retrieval using very short Krylov sequences , 2001 .

[24]  Michael W. Berry,et al.  Improved Query Matching Using kd-Trees: A Latent Semantic Indexing Enhancement , 2000, Information Retrieval.

[25]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[26]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[27]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[28]  Hongyuan Zha,et al.  Large-Scale SVD and Subspace-Based Methods for Information Retrieval , 1998, IRREGULAR.

[29]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[30]  Ayman Farahat,et al.  Authority Rankings from HITS, PageRank, and SALSA: Existence, Uniqueness, and Effect of Initialization , 2005, SIAM J. Sci. Comput..

[31]  C. D. Meyer,et al.  Updating the stationary vector of an irreducible Markov chain , 2002 .

[32]  Padma Raghavan,et al.  Level search schemes for information filtering and retrieval , 2001, Inf. Process. Manag..

[33]  Taher H. Haveliwala,et al.  The Condition Number of the PageRank Problem , 2003 .

[34]  Amy Nicole Langville,et al.  A Survey of Eigenvector Methods for Web Information Retrieval , 2005, SIAM Rev..

[35]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[36]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[37]  Shlomo Moran,et al.  Rank-Stability and Rank-Similarity of Link-Based Web Ranking Algorithms in Authority-Connected Graphs , 2005, Information Retrieval.

[38]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[39]  Michael W. Berry,et al.  Downdating the Latent Semantic Indexing Model for Conceptual Information Retrieval , 1998, Comput. J..

[40]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..