How Linear Reward Helps in Online Resource Allocation

In this paper, we consider an online stochastic resource allocation problem which takes a linear program as its underlying form. We analyze an adaptive allocation algorithm and derives a constant regret bound that is not dependent on the number of time periods (number of decision variables) under the condition that the objective coefficient of the linear program is linear in the corresponding constraint coefficients. Furthermore, the constant regret bound does not assume the knowledge of underlying distribution.

[1]  Thomas S. Ferguson,et al.  Who Solved the Secretary Problem , 1989 .

[2]  Gautam Appa,et al.  On the uniqueness of solutions to linear programs , 2002, J. Oper. Res. Soc..

[3]  S. Geer On Hoeffding's Inequality for Dependent Random Variables , 2002 .

[4]  Aranyak Mehta,et al.  AdWords and generalized on-line matching , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[5]  Deeparnab Chakrabarty,et al.  Knapsack Problems , 2008 .

[6]  Joseph Naor,et al.  Online Primal-Dual Algorithms for Covering and Packing , 2009, Math. Oper. Res..

[7]  Zizhuo Wang,et al.  A unified framework for dynamic pari-mutuel information market design , 2009, EC '09.

[8]  Thorsten Joachims,et al.  Multi-armed Bandit Problems with History , 2012, AISTATS.

[9]  Sunil Kumar,et al.  A Re-Solving Heuristic with Bounded Revenue Loss for Network Revenue Management with Customer Choice , 2012, Math. Oper. Res..

[10]  Aranyak Mehta,et al.  Online Matching and Ad Allocation , 2013, Found. Trends Theor. Comput. Sci..

[11]  Berthold Vöcking,et al.  Primal beats dual on online packing LPs in the random-order model , 2013, STOC.

[12]  Zizhuo Wang,et al.  A Dynamic Near-Optimal Algorithm for Online Linear Programming , 2009, Oper. Res..

[13]  Stefanus Jasin,et al.  Performance of an LP-Based Control for Revenue Management with Unknown Demand Parameters , 2015, Oper. Res..

[14]  R. Srikant,et al.  Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits , 2015, NIPS.

[15]  Xuan Wang,et al.  Online Resource Allocation with Limited Flexibility , 2018, Manag. Sci..

[16]  Siddhartha Banerjee,et al.  The Bayesian Prophet: A Low-Regret Framework for Online Decision Making , 2018, SIGMETRICS.

[17]  He Wang,et al.  A Re-Solving Heuristic with Uniformly Bounded Loss for Network Revenue Management , 2018, Manag. Sci..

[18]  Siddhartha Banerjee,et al.  Constant Regret in Online Allocation: On the Sufficiency of a Single Historical Trace , 2020 .

[19]  Siddhartha Banerjee,et al.  Uniform Loss Algorithms for Online Stochastic Decision-Making With Applications to Bin Packing , 2020, SIGMETRICS.

[20]  Clifford Stein,et al.  Advance Service Reservations with Heterogeneous Customers , 2018, Manag. Sci..

[21]  Y. Ye,et al.  Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds , 2019, Oper. Res..