POND: Pessimistic-Optimistic oNline Dispatch

This paper considers constrained online dispatch with unknown arrival, reward and constraint distributions. We propose a novel online dispatch algorithm, named POND, standing for Pessimistic-Optimistic oNline Dispatch, which achieves $O(\sqrt{T})$ regret and $O(1)$ constraint violation. Both bounds are sharp. Our experiments on synthetic and real datasets show that POND achieves low regret with minimal constraint violations.

[1]  Vahab Mirrokni,et al.  Regularized Online Allocation Problems: Fairness and Beyond , 2020, ArXiv.

[2]  Atilla Eryilmaz,et al.  Budget-Constrained Bandits over General Cost and Reward Distributions , 2020, AISTATS.

[3]  Haipeng Luo,et al.  Fair Contextual Multi-Armed Bandits: Theory and Experiments , 2019, UAI.

[4]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[5]  R. Srikant,et al.  Asymptotically tight steady-state queue length bounds implied by drift conditions , 2011, Queueing Syst. Theory Appl..

[6]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[7]  David Simchi-Levi,et al.  Online Network Revenue Management Using Thompson Sampling , 2017, Oper. Res..

[8]  Yuriy Brun,et al.  Offline Contextual Bandits with High Probability Fairness Guarantees , 2019, NeurIPS.

[9]  Rong Jin,et al.  Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..

[10]  Xiaohan Wei,et al.  Online Convex Optimization with Stochastic Constraints , 2017, NIPS.

[11]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[12]  Alexander L. Stolyar,et al.  Maximizing Queueing Network Utility Subject to Stability: Greedy Primal-Dual Algorithm , 2005, Queueing Syst. Theory Appl..

[13]  R. Srikant,et al.  Analysis and design of an adaptive virtual queue (AVQ) algorithm for active queue management , 2001, SIGCOMM '01.

[14]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[15]  Michael J. Neely,et al.  Energy-Aware Wireless Scheduling With Near-Optimal Backlog and Convergence Time Tradeoffs , 2014, IEEE/ACM Transactions on Networking.

[16]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[17]  Lilian Besson,et al.  What Doubling Tricks Can and Can't Do for Multi-Armed Bandits , 2018, ArXiv.

[18]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[19]  Yashodhan Kanoria,et al.  Matching while Learning , 2016, EC.

[20]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[21]  B. Hajek Hitting-time and occupation-time bounds implied by drift analysis with applications , 1982, Advances in Applied Probability.

[22]  Yashodhan Kanoria,et al.  Know Your Customer: Multi-armed Bandits with Capacity Constraints , 2016, ArXiv.

[23]  Jia Liu,et al.  Combinatorial Sleeping Bandits with Fairness Constraints , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[24]  Hao Yu,et al.  A Low Complexity Algorithm with O(√T) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints , 2020, J. Mach. Learn. Res..

[25]  Vahab S. Mirrokni,et al.  Dual Mirror Descent for Online Allocation Problems , 2020, ICML.

[26]  Sanjay Shakkottai,et al.  On Learning the cμ Rule in Single and Parallel Server Networks , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Xiaojun Lin,et al.  Integrating Online Learning and Adaptive Control in Queueing Systems with Uncertain Payoffs , 2018, 2018 Information Theory and Applications Workshop (ITA).

[28]  Xiaohan Wei,et al.  Online Primal-Dual Mirror Descent under Stochastic Constraints , 2019, Proc. ACM Meas. Anal. Comput. Syst..

[29]  Sanjay Shakkottai,et al.  Regret of Queueing Bandits , 2016, NIPS.

[30]  Lei Ying,et al.  Communication Networks - An Optimization, Control, and Stochastic Networks Perspective , 2014 .

[31]  Gustavo de Veciana,et al.  Online Channel-state Clustering And Multiuser Capacity Learning For Wireless Scheduling , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.