Optimizing web search using web click-through data

The performance of web search engines may often deteriorate due to the diversity and noisy information contained within web pages. User click-through data can be used to introduce more accurate description (metadata) for web pages, and to improve the search performance. However, noise and incompleteness, sparseness, and the volatility of web pages and queries are three major challenges for research work on user click-through log mining. In this paper, we propose a novel iterative reinforced algorithm to utilize the user click-through data to improve search performance. The algorithm fully explores the interrelations between queries and web pages, and effectively finds "virtual queries" for web pages and overcomes the challenges discussed above. Experiment results on a large set of MSN click-through log data show a significant improvement on search performance over the naive query log mining algorithm as well as the baseline search engine.

[1]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[2]  Brian D. Davison,et al.  Finding Relevant Website Queries , 2003, WWW.

[3]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[4]  Vijay V. Raghavan,et al.  On the reuse of past optimal queries , 1995, SIGIR '95.

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[7]  Nicholas J. Belkin,et al.  Helping people find what they don't know , 2000, CACM.

[8]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[9]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[10]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[11]  Yen-Jen Oyang,et al.  Relevant term suggestion in interactive web search based on contextual information in query session logs , 2003, J. Assoc. Inf. Sci. Technol..

[12]  Gerard Salton,et al.  On the use of spreading activation methods in automatic information , 1988, SIGIR '88.

[13]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[14]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[15]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[16]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[17]  Djoerd Hiemstra,et al.  Retrieving Web Pages Using Content, Links, URLs and Anchors , 2001, TREC.

[18]  Ray R. Larson,et al.  Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace , 1996 .

[19]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[20]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[21]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[22]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .