论文信息 - POND: Pessimistic-Optimistic oNline Dispatch

POND: Pessimistic-Optimistic oNline Dispatch

This paper considers constrained online dispatch with unknown arrival, reward and constraint distributions. We propose a novel online dispatch algorithm, named POND, standing for Pessimistic-Optimistic oNline Dispatch, which achieves $O(\sqrt{T})$ regret and $O(1)$ constraint violation. Both bounds are sharp. Our experiments on synthetic and real datasets show that POND achieves low regret with minimal constraint violations.

[1] Vahab Mirrokni,et al. Regularized Online Allocation Problems: Fairness and Beyond , 2020, ArXiv.

[2] Atilla Eryilmaz,et al. Budget-Constrained Bandits over General Cost and Reward Distributions , 2020, AISTATS.

[3] Haipeng Luo,et al. Fair Contextual Multi-Armed Bandits: Theory and Experiments , 2019, UAI.

[4] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[5] R. Srikant,et al. Asymptotically tight steady-state queue length bounds implied by drift conditions , 2011, Queueing Syst. Theory Appl..

[6] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[7] David Simchi-Levi,et al. Online Network Revenue Management Using Thompson Sampling , 2017, Oper. Res..

[8] Yuriy Brun,et al. Offline Contextual Bandits with High Probability Fairness Guarantees , 2019, NeurIPS.

[9] Rong Jin,et al. Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..

[10] Xiaohan Wei,et al. Online Convex Optimization with Stochastic Constraints , 2017, NIPS.

[11] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[12] Alexander L. Stolyar,et al. Maximizing Queueing Network Utility Subject to Stability: Greedy Primal-Dual Algorithm , 2005, Queueing Syst. Theory Appl..

[13] R. Srikant,et al. Analysis and design of an adaptive virtual queue (AVQ) algorithm for active queue management , 2001, SIGCOMM '01.

[14] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[15] Michael J. Neely,et al. Energy-Aware Wireless Scheduling With Near-Optimal Backlog and Convergence Time Tradeoffs , 2014, IEEE/ACM Transactions on Networking.

[16] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[17] Lilian Besson,et al. What Doubling Tricks Can and Can't Do for Multi-Armed Bandits , 2018, ArXiv.

[18] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[19] Yashodhan Kanoria,et al. Matching while Learning , 2016, EC.

[20] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[21] B. Hajek. Hitting-time and occupation-time bounds implied by drift analysis with applications , 1982, Advances in Applied Probability.

[22] Yashodhan Kanoria,et al. Know Your Customer: Multi-armed Bandits with Capacity Constraints , 2016, ArXiv.

[23] Jia Liu,et al. Combinatorial Sleeping Bandits with Fairness Constraints , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[24] Hao Yu,et al. A Low Complexity Algorithm with O(√T) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints , 2020, J. Mach. Learn. Res..

[25] Vahab S. Mirrokni,et al. Dual Mirror Descent for Online Allocation Problems , 2020, ICML.

[26] Sanjay Shakkottai,et al. On Learning the cμ Rule in Single and Parallel Server Networks , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27] Xiaojun Lin,et al. Integrating Online Learning and Adaptive Control in Queueing Systems with Uncertain Payoffs , 2018, 2018 Information Theory and Applications Workshop (ITA).

[28] Xiaohan Wei,et al. Online Primal-Dual Mirror Descent under Stochastic Constraints , 2019, Proc. ACM Meas. Anal. Comput. Syst..

[29] Sanjay Shakkottai,et al. Regret of Queueing Bandits , 2016, NIPS.

[30] Lei Ying,et al. Communication Networks - An Optimization, Control, and Stochastic Networks Perspective , 2014 .

[31] Gustavo de Veciana,et al. Online Channel-state Clustering And Multiuser Capacity Learning For Wireless Scheduling , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.