Workload analysis and caching strategies for search advertising systems

Search advertising depends on accurate predictions of user behavior and interest, accomplished today using complex and computationally expensive machine learning algorithms that estimate the potential revenue gain of thousands of candidate advertisements per search query. The accuracy of this estimation is important for revenue, but the cost of these computations represents a substantial expense, e.g., 10% to 30% of the total gross revenue. Caching the results of previous computations is a potential path to reducing this expense, but traditional domain-agnostic and revenue-agnostic approaches to do so result in substantial revenue loss. This paper presents three domain-specific caching mechanisms that successfully optimize for both factors. Simulations on a trace from the Bing advertising system show that a traditional cache can reduce cost by up to 27.7% but has negative revenue impact as bad as -14.1%. On the other hand, the proposed mechanisms can reduce cost by up to 20.6% while capping revenue impact between -1.3% and 0%. Based on Microsoft's earnings release for FY16 Q4, the traditional cache would reduce the net profit of Bing Ads by $84.9 to $166.1 million in the quarter, while our proposed cache could increase the net profit by $11.1 to $71.5 million.

[1]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[2]  Joaquin Quiñonero Candela,et al.  Practical Lessons from Predicting Clicks on Ads at Facebook , 2014, ADKDD'14.

[3]  Özgür Ulusoy,et al.  Adaptive Time-to-Live Strategies for Query Result Caching in Web Search Engines , 2012, ECIR.

[4]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[5]  Özgür Ulusoy,et al.  Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines , 2013, TWEB.

[6]  Özgür Ulusoy,et al.  A financial cost metric for result caching , 2013, SIGIR.

[7]  Flavio Paiva Junqueira,et al.  Online result cache invalidation for real-time web search , 2012, SIGIR '12.

[8]  Alan L. Cox,et al.  GD-Wheel: a cost-aware replacement policy for key-value stores , 2015, EuroSys.

[9]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[10]  Berkant Barla Cambazoglu,et al.  Impact of Regionalization on Performance of Web Search Engine Result Caches , 2012, SPIRE.

[11]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[12]  Özgür Ulusoy,et al.  Cost-Aware Strategies for Query Result Caching in Web Search Engines , 2011, TWEB.

[13]  Özgür Ulusoy,et al.  Timestamp-based result cache invalidation for web search engines , 2011, SIGIR.

[14]  Ronny Lempel,et al.  Caching for Realtime Search , 2011, ECIR.

[15]  Aristides Gionis,et al.  Design trade-offs for search engine caching , 2008, TWEB.

[16]  Fabrizio Silvestri,et al.  Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data , 2006, TOIS.

[17]  Berkant Barla Cambazoglu,et al.  A refreshing perspective of search engine caching , 2010, WWW '10.