The Bayesian Prophet: A Low-Regret Framework for Online Decision Making

Motivated by the success of using black-box predictive algorithms as subroutines for online decision-making, we develop a new framework for designing online policies given access to an oracle providing statistical information about an offline benchmark. Having access to such prediction oracles enables simple and natural Bayesian selection policies, and raises the question as to how these policies perform in different settings. Our work makes two important contributions towards tackling this question: First, we develop a general technique we call *compensated coupling* which can be used to derive bounds on the expected regret (i.e., additive loss with respect to a benchmark) for any online policy and offline benchmark; Second, using this technique, we show that the Bayes Selector has constant expected regret (i.e., independent of the number of arrivals and resource levels) in any online packing and matching problem with a finite type-space. Our results generalize and simplify many existing results for online packing and matching problems, and suggest a promising pathway for obtaining oracle-driven policies for other online decision-making settings.

[1]  Francesco Borrelli,et al.  Constrained Optimal Control of Linear and Hybrid Systems , 2003, IEEE Transactions on Automatic Control.

[2]  K. Talluri,et al.  The Theory and Practice of Revenue Management , 2004 .

[3]  Kai-Min Chung,et al.  Chernoff-Hoeffding Bounds for Markov Chains: Generalized and Simplified , 2012, STACS.

[4]  Longbo Huang Receding learning-aided control in stochastic networks , 2015, Perform. Evaluation.

[5]  Marco Molinaro,et al.  How Experts Can Solve LPs Online , 2014, ESA.

[6]  Vivek F. Farias,et al.  Pathwise Optimization for Optimal Stopping Problems , 2012, Manag. Sci..

[7]  Siddhartha Banerjee,et al.  The Bayesian Prophet , 2019, SIGMETRICS.

[8]  Vivek F. Farias,et al.  Model Predictive Control for Dynamic Resource Allocation , 2012, Math. Oper. Res..

[9]  Joseph Naor,et al.  Online Primal-Dual Algorithms for Covering and Packing , 2009, Math. Oper. Res..

[10]  W. Lieberman The Theory and Practice of Revenue Management , 2005 .

[11]  Aravind Srinivasan,et al.  Online Stochastic Matching: New Algorithms and Bounds , 2016, Algorithmica.

[12]  Itay Gurvich,et al.  Uniformly bounded regret in the multi-secretary problem , 2017, Stochastic Systems.

[13]  Morteza Zadimoghaddam,et al.  Online Stochastic Weighted Matching: Improved Approximation Algorithms , 2011, WINE.

[14]  Russell Bent,et al.  Online stochastic combinatorial optimization , 2006 .

[15]  James E. Smith,et al.  Information Relaxations, Duality, and Convex Stochastic Dynamic Programs , 2014, Oper. Res..

[16]  Aravind Srinivasan,et al.  New Algorithms, Better Bounds, and a Novel Model for Online Stochastic Matching , 2016, ESA.

[17]  S. Matthew Weinberg,et al.  Matroid prophet inequalities , 2012, STOC '12.

[18]  Sunil Kumar,et al.  A Re-Solving Heuristic with Bounded Revenue Loss for Network Revenue Management with Customer Choice , 2012, Math. Oper. Res..

[19]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[20]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[21]  James E. Smith,et al.  Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats , 2013, Oper. Res..

[22]  Amin Saberi,et al.  Online stochastic matching: online actions based on offline statistics , 2010, SODA '11.

[23]  T. Hill,et al.  Comparisons of Stop Rule and Supremum Expectations of I.I.D. Random Variables , 1982 .

[24]  Joseph Naor,et al.  The Design of Competitive Online Algorithms via a Primal-Dual Approach , 2009, Found. Trends Theor. Comput. Sci..

[25]  José R. Correa,et al.  Posted Price Mechanisms for a Random Stream of Customers , 2017, EC.

[26]  Adam Wierman,et al.  Using Predictions in Online Optimization: Looking Forward with an Eye on the Past , 2016, SIGMETRICS.

[27]  Paul Dütting,et al.  Prophet Inequalities Made Easy: Stochastic Optimization by Pricing Non-Stochastic Inputs , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[28]  John Augustine,et al.  Optimal power-down strategies , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[29]  Nikhil R. Devanur,et al.  Near Optimal Online Algorithms and Fast Approximation Algorithms for Resource Allocation Problems , 2019, J. ACM.

[30]  Berthold Vöcking,et al.  Primal Beats Dual on Online Packing LPs in the Random-Order Model , 2018, SIAM J. Comput..

[31]  O. Mangasarian,et al.  Lipschitz continuity of solutions of linear inequalities, programs and complementarity problems , 1987 .

[32]  John N. Tsitsiklis,et al.  Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.

[33]  Lachlan L. H. Andrew,et al.  Online Convex Optimization Using Predictions , 2015, SIGMETRICS.

[34]  L. Devroye The Equivalence of Weak, Strong and Complete Convergence in $L_1$ for Kernel Density Estimates , 1983 .

[35]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[36]  Van-Anh Truong,et al.  Online Advance Admission Scheduling for Services with Customer Preferences. , 2018, 1805.10412.

[37]  R. Srikant,et al.  Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits , 2015, NIPS.

[38]  Alessandro Panconesi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[39]  Cong Shi,et al.  Stochastic regret minimization for revenue management problems with nonstationary demands , 2016 .

[40]  He Wang,et al.  A Re-solving Heuristic for Dynamic Resource Allocation with Uniformly Bounded Revenue Loss , 2018 .

[41]  Qiong Wang,et al.  An Asymptotically Optimal Policy for a Quantity-Based Network Revenue Management Problem , 2008, Math. Oper. Res..

[42]  Saeed Alaei,et al.  Bayesian Combinatorial Auctions: Expanding Single Buyer Mechanisms to Many Buyers , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[43]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[44]  He Wang,et al.  A Re-Solving Heuristic with Uniformly Bounded Loss for Network Revenue Management , 2018, Manag. Sci..

[45]  Nikhil R. Devanur,et al.  Fast Algorithms for Online Stochastic Convex Programming , 2014, SODA.

[46]  Marko Bacic,et al.  Model predictive control , 2003 .