论文信息 - Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

In search advertising, the search engine needs to select the most profitable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem. In this paper, we show that the naive application of MAB algorithms to search advertising for advertisement selection will produce sample selection bias that harms the search engine by decreasing expected revenue and "estimation of the largest mean" (ELM) bias that harms the advertisers by increasing game-theoretic player-regret. We then propose simple bias-correction methods with benefits to both the search engine and the advertisers.

[1] A. Cohen,et al. ESTIMATION OF THE LARGER OF TWO NORMAL MEANS , 1968 .

[2] Nikhil R. Devanur,et al. The price of truthfulness for pay-per-click auctions , 2009, EC '09.

[3] R. Preston McAfee,et al. Value of Learning in Sponsored Search Auctions , 2010, WINE.

[4] Alessandro Lazaric,et al. A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities , 2012, EC '12.

[5] D. BhaeiyalIshwaei,et al. Non-existence of unbiased estimators of ordered parameters , 1985 .

[6] Khursheed Alam,et al. A two-sample estimate of the largest mean , 1967 .

[7] Rica Gonen,et al. An incentive-compatible multi-armed bandit mechanism , 2007, PODC '07.

[8] Tim Roughgarden,et al. Algorithmic Game Theory , 2007 .

[9] Sujit Gujar,et al. Multi-Armed Bandit Mechanisms for Multi-Slot Sponsored Search Auctions , 2010, ArXiv.

[10] John Langford,et al. Maintaining Equilibria During Exploration in Sponsored Search Auctions , 2007, WINE.

[11] Sandeep Pandey,et al. Handling Advertisements of Unknown Quality in Search Advertising , 2006, NIPS.