Online Resource Allocation with Stochastic Resource Consumption

We consider an online resource allocation problem where multiple resources, each with an individual initial capacity, are available to serve random requests arriving sequentially over multiple discrete time periods. At each time period, one request arrives and its associated reward and size are drawn independently from a known distribution that could be resource-dependent. Upon its arrival and revealing itself, an online decision has to be made on whether or not to accept the request. If accepted, another online decision should also be made to specify on assigning which resource to serve the request, then a certain amount of the resource equal to the size of the request will be consumed and a reward will be collected. The objective of the decision maker is to maximize the total collected reward subject to the capacity constraints. We develop near-optimal policies for our problem under two settings separately. When the reward distribution has a finite support, we assume that given each reward realization, the size distribution equals a deterministic weight multiplied by a single-dimensional random variable and propose an adaptive threshold policy. We show that with certain regularity conditions on the size distribution, our policy enjoys an optimal $O(\log T)$ regret bound, where $T$ denotes the total time periods. When the support of the reward distribution is not necessarily finite, we develop another adaptive threshold policy with a $O(\log T)$ regret bound when both the reward and the size of each request are resource-independent and the size distribution satisfies the same conditions.

[1]  Jan Vondrák,et al.  Approximating the stochastic knapsack problem: the benefit of adaptivity , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[2]  Siddhartha Banerjee,et al.  The Bayesian Prophet: A Low-Regret Framework for Online Decision Making , 2018, SIGMETRICS.

[3]  Paul Dütting,et al.  Prophet Inequalities Made Easy: Stochastic Optimization by Pricing Non-Stochastic Inputs , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[4]  Stefanus Jasin,et al.  Performance of an LP-Based Control for Revenue Management with Unknown Demand Parameters , 2015, Oper. Res..

[5]  G. Dantzig Discrete-Variable Extremum Problems , 1957 .

[6]  Alessandro Arlotto,et al.  Logarithmic Regret in the Dynamic and Stochastic Knapsack Problem with Equal Rewards , 2018, Stochastic Systems.

[7]  Will Ma,et al.  Improvements and Generalizations of Stochastic Knapsack and Markovian Bandits Approximation Algorithms , 2018, Math. Oper. Res..

[8]  Deeparnab Chakrabarty,et al.  Budget constrained bidding in keyword auctions and online knapsack problems , 2008, WWW.

[9]  George S. Lueker,et al.  Average-case analysis of off-line and on-line knapsack problems , 1995, SODA '95.

[10]  Joseph Naor,et al.  Online Primal-Dual Algorithms for Covering and Packing , 2009, Math. Oper. Res..

[11]  Nikhil R. Devanur,et al.  Near optimal online algorithms and fast approximation algorithms for resource allocation problems , 2011, EC '11.

[12]  Itay Gurvich,et al.  Uniformly bounded regret in the multi-secretary problem , 2017, Stochastic Systems.

[13]  Nicole Immorlica,et al.  A Knapsack Secretary Problem with Applications , 2007, APPROX-RANDOM.

[14]  Zizhuo Wang,et al.  A Dynamic Near-Optimal Algorithm for Online Linear Programming , 2009, Oper. Res..

[15]  Sunil Kumar,et al.  A Re-Solving Heuristic with Bounded Revenue Loss for Network Revenue Management with Customer Choice , 2012, Math. Oper. Res..

[16]  He Wang,et al.  A Re-Solving Heuristic with Uniformly Bounded Loss for Network Revenue Management , 2018, Manag. Sci..

[17]  Siddhartha Banerjee,et al.  Online Allocation and Pricing: Constant Regret via Bellman Inequalities , 2019, Oper. Res..

[18]  Siddhartha Banerjee,et al.  Uniform Loss Algorithms for Online Stochastic Decision-Making With Applications to Bin Packing , 2020, SIGMETRICS.

[19]  J. B. Robertson,et al.  ‘Wald's Lemma' for sums of order statistics of i.i.d. random variables , 1991, Advances in Applied Probability.

[20]  Carlo Vercellis,et al.  Stochastic on-line knapsack problems , 1995, Math. Program..

[21]  Siddhartha Banerjee,et al.  Uniform Loss Algorithms for Online Stochastic Decision-Making With Applications to Bin Packing , 2020, Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.

[22]  R. Weber,et al.  Optimal selection of stochastic intervals under a sum constraint , 1987 .

[23]  Yonatan Gur,et al.  Learning in Repeated Auctions with Budgets: Regret Minimization and Equilibrium , 2017, EC.

[24]  Ashish Goel,et al.  Improved approximation results for stochastic knapsack problems , 2011, SODA '11.

[25]  G. J. Lieberman,et al.  A Stochastic Sequential Allocation Model , 1975, Oper. Res..

[26]  Yinyu Ye,et al.  Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds , 2019, ArXiv.