The sum of its parts: reducing sparsity in click estimation with query segments

The critical task of predicting clicks on search advertisements is typically addressed by learning from historical click data. When enough history is observed for a given query-ad pair, future clicks can be accurately modeled. However, based on the empirical distribution of queries, sufficient historical information is unavailable for many query-ad pairs. The sparsity of data for new and rare queries makes it difficult to accurately estimate clicks for a significant portion of typical search engine traffic. In this paper we provide analysis to motivate modeling approaches that can reduce the sparsity of the large space of user search queries. We then propose methods to improve click and relevance models for sponsored search by mining click behavior for partial user queries. We aggregate click history for individual query words, as well as for phrases extracted with a CRF model. The new models show significant improvement in clicks and revenue compared to state-of-the-art baselines trained on several months of query logs. Results are reported on live traffic of a commercial search engine, in addition to results from offline evaluation.

[1]  Rajiv Khanna,et al.  Estimating rates of rare events with multiple hierarchies through scalable log-linear models , 2010, KDD '10.

[2]  Jaime Teevan,et al.  Query log analysis: social and technological challenges , 2007, SIGF.

[3]  Filip Radlinski,et al.  Optimizing relevance and revenue in ad search: a query substitution approach , 2008, SIGIR '08.

[4]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[5]  Charles L. A. Clarke,et al.  Estimating Ad Clickthrough Rate through Query Intent Analysis , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[6]  Dustin Hillard,et al.  A COLLABORATIVE FILTERING APPROACH TO SPONSORED SEARCH , 2009 .

[7]  Vassilis Plachouras,et al.  Online learning from click data for sponsored search , 2008, WWW.

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Andrei Z. Broder,et al.  Search advertising using web relevance feedback , 2008, CIKM '08.

[10]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[11]  Wei Vivian Zhang,et al.  Comparing Click Logs and Editorial Labels for Training Query Rewriting , 2007 .

[12]  Xiao Li,et al.  Extracting structured information from user queries with semi-supervised conditional random fields , 2009, SIGIR.

[13]  Ricardo A. Baeza-Yates,et al.  Improving search engines by query clustering , 2007, J. Assoc. Inf. Sci. Technol..

[14]  Mordecai Avriel,et al.  Nonlinear programming , 1976 .

[15]  Bernard J. Jansen,et al.  Examining Searcher Perceptions of and Interactions with Sponsored Results , 2005 .

[16]  Erick Cantú-Paz,et al.  Temporal click model for sponsored search , 2010, SIGIR.

[17]  T. Minka A comparison of numerical optimizers for logistic regression , 2004 .

[18]  Andrei Z. Broder,et al.  Online expansion of rare queries for sponsored search , 2009, WWW '09.

[19]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[20]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[21]  Hongyuan Zha,et al.  Learning User Clicks in Web Search , 2007, IJCAI.

[22]  Rukmini Iyer,et al.  Probabilistic first pass retrieval for search advertising: from theory to practice , 2010, CIKM.

[23]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[24]  Hema Raghavan Evaluating Vector-Space and Probabilistic Models for Query to Ad Matching , 2008 .

[25]  Olfa Nasraoui,et al.  Mining search engine query logs for query recommendation , 2006, WWW '06.

[26]  Hongyuan Zha,et al.  A General Boosting Method and its Application to Learning Ranking Functions for Web Search , 2007, NIPS.

[27]  Ramakrishnan Srikant,et al.  User browsing models: relevance versus examination , 2010, KDD.

[28]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[29]  Hema Raghavan,et al.  Improving ad relevance in sponsored search , 2010, WSDM '10.

[30]  Eric Brill,et al.  Improving web search ranking by incorporating user behavior information , 2006, SIGIR.

[31]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[32]  D. Sculley,et al.  Predicting bounce rates in sponsored search advertisements , 2009, KDD.

[33]  Daniel C. Fain,et al.  Predicting Click-Through Rate Using Keyword Clusters , 2006 .

[34]  Andrei Z. Broder,et al.  Estimating rates of rare events at multiple resolutions , 2007, KDD '07.

[35]  Stanley F. Chen,et al.  A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[36]  Tasos Anastasakos,et al.  A collaborative filtering approach to ad recommendation using the query-ad click graph , 2009, CIKM.

[37]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[38]  Fuchun Peng,et al.  Unsupervised query segmentation using generative language models and wikipedia , 2008, WWW.

[39]  Rukmini Iyer,et al.  Data-driven text features for sponsored search click prediction , 2009, KDD Workshop on Data Mining and Audience Intelligence for Advertising.

[40]  Chao Liu,et al.  Click chain model in web search , 2009, WWW '09.

[41]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .