Constant Regret in Online Allocation: On the Sufficiency of a Single Historical Trace

We consider online decision-making problems where resources are allocated dynamically to a stochastic stream of requests, and decisions are made to maximize reward while satisfying a set of constraints. We propose and analyze a simple algorithm that uses only historical data, i.e., traces (sample paths) of the stochastic process. We prove that, in a large family of problems, which includes as special cases online packing and online matching, the algorithm has near-optimal performance with the minimum possible sample-complexity; in particular, it obtains constant regret with as few as one trace. The algorithm is agnostic of the generative model of arrivals; however, the results hold even under time-varying and correlated arrival processes. Finally, even in settings beyond our theoretical guarantees, our framework generates data-friendly algorithms that match and beat the performance of specialized state-of-the-art algorithms in simulations.

[1]  William J. Cook,et al.  Sensitivity theorems in integer linear programming , 1986, Math. Program..

[2]  O. Mangasarian,et al.  Lipschitz continuity of solutions of linear inequalities, programs and complementarity problems , 1987 .

[3]  G. Ryzin,et al.  Optimal dynamic pricing of inventories with stochastic demand over finite horizons , 1994 .

[4]  John N. Tsitsiklis,et al.  Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..

[5]  Joseph Naor,et al.  The Design of Competitive Online Algorithms via a Primal-Dual Approach , 2009, Found. Trends Theor. Comput. Sci..

[6]  Provably Near-Optimal LP-Based Policies for Revenue Management in Systems with Reusable Resources , 2010, Oper. Res..

[7]  Nikhil R. Devanur,et al.  Near optimal online algorithms and fast approximation algorithms for resource allocation problems , 2011, EC '11.

[8]  D. Paulin Concentration inequalities for Markov chains by Marton couplings and spectral methods , 2012, 1212.2015.

[9]  S. Matthew Weinberg,et al.  Matroid prophet inequalities , 2012, STOC '12.

[10]  Sunil Kumar,et al.  A Re-Solving Heuristic with Bounded Revenue Loss for Network Revenue Management with Customer Choice , 2012, Math. Oper. Res..

[11]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[12]  S. Matthew Weinberg,et al.  Prophet Inequalities with Limited Information , 2013, SODA.

[13]  Nikhil R. Devanur,et al.  Bandits with concave rewards and convex knapsacks , 2014, EC.

[14]  Zizhuo Wang,et al.  A Dynamic Near-Optimal Algorithm for Online Linear Programming , 2009, Oper. Res..

[15]  Jianqing Fan,et al.  Hoeffding's lemma for Markov Chains and its applications to statistical learning , 2018, 1802.00211.

[16]  The Bayesian Prophet , 2019, SIGMETRICS.

[17]  Itay Gurvich,et al.  Uniformly bounded regret in the multi-secretary problem , 2017, Stochastic Systems.

[18]  He Wang,et al.  A Re-Solving Heuristic with Uniformly Bounded Loss for Network Revenue Management , 2018, Manag. Sci..

[19]  Huseyin Topaloglu,et al.  Dynamic Assortment Optimization for Reusable Products with Random Usage Durations , 2020, Manag. Sci..

[20]  S. Matthew Weinberg,et al.  Optimal Single-Choice Prophet Inequalities from Samples , 2019, ITCS.

[21]  Siddhartha Banerjee,et al.  Online Allocation and Pricing: Constant Regret via Bellman Inequalities , 2019, Oper. Res..