Click-through prediction for news queries

A growing trend in commercial search engines is the display of specialized content such as news, products, etc. interleaved with web search results. Ideally, this content should be displayed only when it is highly relevant to the search query, as it competes for space with "regular" results and advertisements. One measure of the relevance to the search query is the click-through rate the specialized content achieves when displayed; hence, if we can predict this click-through rate accurately, we can use this as the basis for selecting when to show specialized content. In this paper, we consider the problem of estimating the click-through rate for dedicated news search results. For queries for which news results have been displayed repeatedly before, the click-through rate can be tracked online; however, the key challenge for which previously unseen queries to display news results remains. In this paper we propose a supervised model that offers accurate prediction of news click-through rates and satisfies the requirement of adapting quickly to emerging news events.

[1]  Filip Radlinski,et al.  Active exploration for learning rankings from clickthrough data , 2007, KDD '07.

[2]  Yuval Rabani,et al.  Cell-probe lower bounds for the partial match problem , 2003, STOC '03.

[3]  Xiaoyuan Wu,et al.  Keyword extraction for contextual advertisement , 2008, WWW.

[4]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[5]  Deepak Agarwal,et al.  Online Models for Content Optimization , 2008, NIPS.

[6]  Eric Brill,et al.  Improving web search ranking by incorporating user behavior information , 2006, SIGIR.

[7]  Elad Yom-Tov,et al.  What makes a query difficult? , 2006, SIGIR.

[8]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[9]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[10]  Francesco Romani,et al.  Ranking a stream of news , 2005, WWW '05.

[11]  Ben Carterette,et al.  Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks , 2007, NIPS.

[12]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[13]  Yury Lifshits,et al.  Estimation of the Click Volume by Large Scale Regression Analysis , 2007, CSR.

[14]  Daniel C. Fain,et al.  Predicting Click-Through Rate Using Keyword Clusters , 2006 .

[15]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Matthew Hurst,et al.  Social Streams Blog Crawler , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[18]  Fernando Diaz,et al.  Integration of news content into web results , 2009, WSDM '09.

[19]  Michael A. West,et al.  Bayesian Forecasting and Dynamic Models (2nd edn) , 1997, J. Oper. Res. Soc..

[20]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[21]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[22]  Jianfeng Gao,et al.  Ranking, Boosting, and Model Adaptation , 2008 .

[23]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[24]  Deepayan Chakrabarti,et al.  Contextual advertising by combining relevance with click feedback , 2008, WWW.

[25]  Ping Li,et al.  Using Sketches to Estimate Two-way and Multi-way Associations , 2005 .

[26]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[27]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[28]  Monika Henzinger,et al.  Query-free news search , 2003, WWW.

[29]  Kenneth Ward Church,et al.  A Data Structure for Sponsored Search , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[30]  Kenneth Ward Church,et al.  A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations , 2007, CL.