Online learning for auction mechanism in bandit setting

This paper is concerned with online learning of the optimal auction mechanism for sponsored search in a bandit setting. Previous works take the click-through rates of ads to be fixed and known to the search engine and use this information to design optimal auction mechanism. However, the assumption is not practical since ads can only receive clicks when they are shown to users. To tackle this problem, we propose to use online learning for auction mechanism design. To be specific, this task corresponds to a new type of bandit problem, which we call the armed bandit problem with shared information (AB-SI). In the AB-SI problem, the arm space (corresponding to the parameter space of the auction mechanism which can be discrete or continuous) is partitioned into a finite number of clusters (corresponding to the finite number of rankings of the ads), and the arms in the same cluster share the explored information (i.e., the click-through rates of the ads in the same ranked list) when any arm from the cluster is pulled. We propose two upper-confidence-bound algorithms called UCB-SI1 and UCB-SI2 to tackle this new problem in discrete-armed bandit and continuum-armed bandit setting respectively. We show that when the total number of arms is finite, the regret bound obtained by UCB-SI1 algorithm is tighter than the classical UCB1 algorithm. In the continuum-armed bandit setting, our proposed UCB-SI2 algorithm can handle a larger classes of reward function and achieve a regret bound of O(T2/3(dlnT)1/3), where d is the pseudo dimension for the real-valued reward function class. Experimental results show that the proposed algorithms can significantly outperform several classical online learning methods on synthetic data. We develop a new bandit problem, called armed bandit with shared information.We propose two UCB-SI algorithms to handle the proposed problem.We show for finite number of arms, the regret of UCB-SI1 is better than UCB1.We show UCB-SI2 can handle complex reward function in continuum-armed setting.

[1]  Y. Narahari,et al.  An Optimal Mechanism for Sponsored Search Auctions on the Web and Comparison With Other Mechanisms , 2009, IEEE Transactions on Automation Science and Engineering.

[2]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[3]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[4]  Rica Gonen,et al.  An incentive-compatible multi-armed bandit mechanism , 2007, PODC '07.

[5]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[6]  Moshe Babaioff,et al.  Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2008, EC '09.

[7]  Sandeep Pandey,et al.  Handling Advertisements of Unknown Quality in Search Advertising , 2006, NIPS.

[8]  Jian Hu,et al.  Optimizing search engine revenue in sponsored search , 2009, SIGIR.

[9]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[10]  Deepayan Chakrabarti,et al.  Multi-armed bandit problems with dependent arms , 2007, ICML '07.

[11]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12]  Y. Narahari,et al.  Design of an Optimal Auction for Sponsored Search Auction , 2007, The 9th IEEE International Conference on E-Commerce Technology and The 4th IEEE International Conference on Enterprise Computing, E-Commerce and E-Services (CEC-EEE 2007).

[13]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[14]  Kursad Asdemir A dynamic model of bidding patterns in sponsored search auctions , 2011, Inf. Technol. Manag..

[15]  David M. Pennock,et al.  Revenue analysis of a family of ranking rules for keyword auctions , 2007, EC '07.

[16]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[17]  Filip Radlinski,et al.  Optimizing relevance and revenue in ad search: a query substitution approach , 2008, SIGIR '08.

[18]  Moshe Babaioff,et al.  Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2009, EC '09.