论文信息 - Exploitation and exploration in a performance based contextual advertising system

Exploitation and exploration in a performance based contextual advertising system

The dynamic marketplace in online advertising calls for ranking systems that are optimized to consistently promote and capitalize better performing ads. The streaming nature of online data inevitably makes an advertising system choose between maximizing its expected revenue according to its current knowledge in short term (exploitation) and trying to learn more about the unknown to improve its knowledge (exploration), since the latter might increase its revenue in the future. The exploitation and exploration (EE) tradeoff has been extensively studied in the reinforcement learning community, however, not been paid much attention in online advertising until recently. In this paper, we develop two novel EE strategies for online advertising. Specifically, our methods can adaptively balance the two aspects of EE by automatically learning the optimal tradeoff and incorporating confidence metrics of historical performance. Within a deliberately designed offline simulation framework we apply our algorithms to an industry leading performance based contextual advertising system and conduct extensive evaluations with real online event log data. The experimental results and detailed analysis reveal several important findings of EE behaviors in online advertising and demonstrate that our algorithms perform superiorly in terms of ad reach and click-through-rate (CTR).

[1] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[2] Bee-Chung Chen,et al. Explore/Exploit Schemes for Web Content Optimization , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[3] Ben Gerson. The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture , 2005 .

[4] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[5] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] Ambuj Tewari,et al. Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[8] P. Chatterjee,et al. Modeling the Clickstream: Implications for Web-Based Advertising Efforts , 2003 .

[9] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[10] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[11] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[12] Deepayan Chakrabarti,et al. Contextual advertising by combining relevance with click feedback , 2008, WWW.

[13] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..