Improving web search relevance and freshness with content previews

Traditional web search engines find it challenging to achieve good search quality for recency-sensitive queries, as they are prone to delays in discovering, indexing and ranking new web pages. In this paper we introduce PreGen, an adaptive preview generation system, which is run as part of a web search engine to improve search result quality for recency-sensitive queries. PreGen uses a machine learning algorithm to classify and select live web feeds, and generates "previews" of new web pages based on the link descriptions available in these feeds. The search engine can then index and present relevant page previews as part of its search results before the pages are fetched from the web, thereby reducing end-to-end delays. Our experiments show that PreGen improves the search relevance of a state-of-the-art search engine for recency-sensitive queries by 3% and reduces the average latencies of affected documents by 50%.

[1]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[2]  Juan Julián Merelo Guervós,et al.  NectaRSS, an intelligent RSS feed reader , 2008, J. Netw. Comput. Appl..

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  M. Thelwall,et al.  A comparison of feature selection methods for an evolving RSS feed corpus , 2006, Inf. Process. Manag..

[5]  Georgia Koutrika,et al.  Can social bookmarking improve web search? , 2008, WSDM '08.

[6]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[7]  Eric Brill,et al.  Beyond PageRank: machine learning for static ranking , 2006, WWW '06.

[8]  Jaime G. Carbonell,et al.  Retrieval and feedback models for blog feed search , 2008, SIGIR '08.

[9]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[10]  Gilad Mishne,et al.  Mining rich session context to improve web search , 2009, KDD.

[11]  Gilad Mishne,et al.  Towards recency ranking in web search , 2010, WSDM '10.

[12]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[13]  ChengXiang Zhai,et al.  Mining long-term search history to improve search accuracy , 2006, KDD '06.

[14]  Craig MacDonald,et al.  Key blog distillation: ranking aggregates , 2008, CIKM '08.

[15]  Ian H. Witten,et al.  Weka-A Machine Learning Workbench for Data Mining , 2005, Data Mining and Knowledge Discovery Handbook.

[16]  Matt Welsh,et al.  Cobra: Content-based Filtering and Aggregation of Blogs and RSS Feeds , 2007, NSDI.

[17]  Gilad Mishne Using Blog Properties to Improve Retrieval , 2007, ICWSM.