Better Caching in Search Advertising Systems with Rapid Refresh Predictions

To maximize profit and connect users to relevant products and services, search advertising systems use sophisticated machine learning algorithms to estimate the revenue expectations of thousands of matching ad listings per query. These machine learning computations constitute a substantial part of the operating cost, e.g., 10% to 30% of the total gross revenues. It is desirable to cache and reuse previous computation results to reduce this cost, but caching introduces approximation which comes with potential revenue loss. To maximize cost savings while minimizing the overall revenue impact, an intelligent refresh policy is required to decide when to refresh the cached computation results. The state-of-the-art manually-tuned refresh heuristic uses revenue history to assign different refresh frequencies. Using the gradient boosting regression tree algorithm with well selected features, we introduce a rapid prediction framework that provides refresh decisions at higher accuracy compared to the heuristic. This enables us to build a prediction-based refresh policy and a cache achieving higher profit without manual parameter tuning. Simulations conducted on the logs from a major commercial search advertising system show that our proposed cache design reduces the negative revenue impact (0.07x), and improves the cost savings (1.41x) and the net profit (1.50~1.70x) compared to the state-of-the-art manually-tuned heuristic-based cache design.

[1]  Flavio Paiva Junqueira,et al.  Online result cache invalidation for real-time web search , 2012, SIGIR '12.

[2]  Qiang Fu,et al.  Workload analysis and caching strategies for search advertising systems , 2017, SoCC.

[3]  Özgür Ulusoy,et al.  Timestamp-based result cache invalidation for web search engines , 2011, SIGIR.

[4]  Özgür Ulusoy,et al.  Cost-Aware Strategies for Query Result Caching in Web Search Engines , 2011, TWEB.

[5]  Aristides Gionis,et al.  Design trade-offs for search engine caching , 2008, TWEB.

[6]  Ronny Lempel,et al.  Caching for Realtime Search , 2011, ECIR.

[7]  Berkant Barla Cambazoglu,et al.  A refreshing perspective of search engine caching , 2010, WWW '10.

[8]  Craig MacDonald,et al.  Learning to predict response times for online query scheduling , 2012, SIGIR '12.

[9]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[10]  Berkant Barla Cambazoglu,et al.  Impact of Regionalization on Performance of Web Search Engine Result Caches , 2012, SPIRE.

[11]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[12]  Joaquin Quiñonero Candela,et al.  Practical Lessons from Predicting Clicks on Ads at Facebook , 2014, ADKDD'14.

[13]  Özgür Ulusoy,et al.  Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines , 2013, TWEB.

[14]  Seung-won Hwang,et al.  Predictive parallelization: taming tail latencies in web search , 2014, SIGIR.

[15]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[16]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[17]  Özgür Ulusoy,et al.  A financial cost metric for result caching , 2013, SIGIR.

[18]  Seung-won Hwang,et al.  Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search , 2015, WSDM.

[19]  Özgür Ulusoy,et al.  Adaptive Time-to-Live Strategies for Query Result Caching in Web Search Engines , 2012, ECIR.

[20]  Fabrizio Silvestri,et al.  Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data , 2006, TOIS.