Online Learning of Optimal Bidding Strategy in Repeated Multi-Commodity Auctions

We study the online learning problem of a bidder who participates in repeated auctions. With the goal of maximizing his T-period payoff, the bidder determines the optimal allocation of his budget among his bids for $K$ goods at each period. As a bidding strategy, we propose a polynomial-time algorithm, inspired by the dynamic programming approach to the knapsack problem. The proposed algorithm, referred to as dynamic programming on discrete set (DPDS), achieves a regret order of $O(\sqrt{T\log{T}})$. By showing that the regret is lower bounded by $\Omega(\sqrt{T})$ for any strategy, we conclude that DPDS is order optimal up to a $\sqrt{\log{T}}$ term. We evaluate the performance of DPDS empirically in the context of virtual trading in wholesale electricity markets by using historical data from the New York market. Empirical results show that DPDS consistently outperforms benchmark heuristic methods that are derived from machine learning and online learning approaches.

[1]  H. Kellerer,et al.  The Multiple-Choice Knapsack Problem , 2004 .

[2]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[3]  Aleksandrs Slivkins,et al.  Sharp dichotomies for regret minimization in metric spaces , 2009, SODA '10.

[4]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[5]  Mehryar Mohri,et al.  Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers , 2014, NIPS.

[6]  Umar Syed,et al.  Learning Prices for Repeated Auctions with Strategic Buyers , 2013, NIPS.

[7]  Pravin Varaiya,et al.  Model and data analysis of two-settlement electricity market with virtual bidding , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[8]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[9]  Alexandre B. Tsybakov Lower bounds on the minimax risk , 2009 .

[10]  Shmuel S. Oren,et al.  Efficiency impact of convergence bidding in the california electricity market , 2015 .

[11]  Alexandre M. Bayen,et al.  The Hedge Algorithm on a Continuum , 2015, ICML.

[12]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[13]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[14]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[15]  Eli Upfal,et al.  Bandits and Experts in Metric Spaces , 2013, J. ACM.

[16]  Vianney Perchet,et al.  Online learning in repeated auctions , 2015, COLT.

[17]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[18]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[19]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[20]  K. Dudzinski,et al.  Exact methods for the knapsack problem and its generalizations , 1987 .

[21]  Robert D. Kleinberg,et al.  Regret bounds for sleeping experts and bandits , 2010, Machine Learning.

[22]  Constantinos Daskalakis,et al.  Learning in Auctions: Regret is Hard, Envy is Easy , 2015, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[23]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[24]  Haipeng Luo,et al.  Oracle-Efficient Online Learning and Auction Design , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[25]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[26]  Eric W. Cope,et al.  Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[27]  Erin Mastrangelo,et al.  Financial Arbitrage and Efficient Dispatch in Wholesale Electricity Markets , 2015 .

[28]  Nicholas R. Jennings,et al.  Efficient Regret Bounds for Online Bid Optimisation in Budget-Limited Sponsored Search Auctions , 2014, UAI.

[29]  Paul Milgrom,et al.  Putting Auction Theory to Work , 2004 .

[30]  Anton Schwaighofer,et al.  Budget Optimization for Sponsored Search: Censored Learning in MDPs , 2012, UAI.