Approximate Dynamic Programming for Large-Scale Resource Allocation Problems

We present modeling and solution strategies for large-scale resource allocation prob- lems that take place over multiple time periods under uncertainty. In general, the strategies we present formulate the problem as a dynamic program and replace the value functions with tractable approximations. The approximations of the value func- tions are obtained by using simulated trajectories of the system and iteratively improving on (possibly naive) initial approximations; we propose several improvement algorithms for this purpose. As a result, the resource allocation problem decomposes into time-staged subproblems, where the impact of the current decisions on the future evolution of the system is assessed through value function approximations. Computa- tional experiments indicate that the strategies we present yield high-quality solutions. We also present comparisons with conventional stochastic programming methods.

[1]  R. Wets Stochastic Programs with Fixed Recourse: The Equivalent Deterministic Program , 1974 .

[2]  Julia L. Higle,et al.  Stochastic Decomposition: An Algorithm for Two-Stage Linear Programs with Recourse , 1991, Math. Oper. Res..

[3]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4]  Warrren B Powell,et al.  An Adaptive, Distribution-Free Algorithm for the Newsvendor Problem with Censored Demands, with Applications to Inventory and Distribution , 2001 .

[5]  J. Birge,et al.  A separable piecewise linear upper bound for stochastic linear programs , 1988 .

[6]  G. Dantzig,et al.  The Allocation of Aircraft to Routes—An Example of Linear Programming Under Uncertain Demand , 1956 .

[7]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[8]  R. Wets Programming Under Uncertainty: The Equivalent Convex Program , 1966 .

[9]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[10]  Warrren B Powell,et al.  Convergent Cutting-Plane and Partial-Sampling Algorithm for Multistage Stochastic Linear Programs with Recourse , 1999 .

[11]  Warren B. Powell,et al.  The Dynamic Assignment Problem , 2004, Transp. Sci..

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  Yuri Ermoliev,et al.  Numerical techniques for stochastic optimization , 1988 .

[14]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[15]  Warren B. Powell,et al.  A framework for representing and solving dynamic resource transformation problems , 1999 .

[16]  R. Wets,et al.  L-SHAPED LINEAR PROGRAMS WITH APPLICATIONS TO OPTIMAL CONTROL AND STOCHASTIC PROGRAMMING. , 1969 .

[17]  Huseyin Topaloglu A parallelizable dynamic fleet management model with random travel times , 2006, Eur. J. Oper. Res..

[18]  Warren B. Powell,et al.  Incorporating Pricing Decisions into the Stochastic Dynamic Fleet Management Problem , 2007, Transp. Sci..

[19]  Warren B. Powell,et al.  An Adaptive Dynamic Programming Algorithm for the Heterogeneous Resource Allocation Problem , 2002, Transp. Sci..

[20]  Warren B. Powell,et al.  An Algorithm for Multistage Dynamic Networks with Random Arc Capacities, with an Application to Dynamic Fleet Management , 1996, Oper. Res..

[21]  Warrren B Powell,et al.  An adaptive dynamic programming algorithm for a stochastic multiproduct batch dispatch problem , 2003 .

[22]  Warren B. Powell,et al.  A Representational Paradigm for Dynamic Resource Transformation Problems , 2001, Ann. Oper. Res..

[23]  Warren B. Powell,et al.  Shape - a Stochastic Hybrid Approximation Procedure for Two-Stage Stochastic Programs , 2000, Oper. Res..

[24]  John N. Tsitsiklis,et al.  Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.

[25]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[26]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[27]  Warren B. Powell,et al.  Learning Algorithms for Separable Approximations of Discrete Stochastic Optimization Problems , 2004, Math. Oper. Res..

[28]  Warren B. Powell,et al.  Stochastic programs over trees with random arc capacities , 1994, Networks.

[29]  Stein W. Wallace A piecewise linear upper bound on the network recourse function , 1987, Math. Program..

[30]  Warren B. Powell,et al.  Dynamic Control of Logistics Queueing Networks for Large-Scale Fleet Management , 1998, Transp. Sci..

[31]  Warren B. Powell,et al.  Models and Algorithms for Distribution Problems with Uncertain Demands , 1996, Transp. Sci..

[32]  Warren B. Powell,et al.  Dynamic-Programming Approximations for Stochastic Time-Staged Integer Multicommodity-Flow Problems , 2006, INFORMS J. Comput..

[33]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[34]  Warren B. Powell,et al.  Sensitivity Analysis of a Dynamic Fleet Management Model Using Approximate Dynamic Programming , 2007, Oper. Res..

[35]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[36]  Warren B. Powell,et al.  An algorithm for approximating piecewise linear concave functions from sample gradients , 2003, Oper. Res. Lett..

[37]  R. Wets,et al.  Stochastic programming , 1989 .

[38]  Warren B. Powell,et al.  A Distributed Decision-Making Structure for Dynamic Resource Allocation Using Nonlinear Functional Approximations , 2005, Oper. Res..

[39]  B. WETSt,et al.  STOCHASTIC PROGRAMS WITH FIXED RECOURSE : THE EQUIVALENT DETERMINISTIC PROGRAM , 2022 .

[40]  Warren B. Powell,et al.  A Review of Sensitivity Results for Linear Networks and a New Approximation to Reduce the Effects of Degeneracy , 1989, Transp. Sci..

[41]  Warren B. Powell,et al.  Dynamic control of multicommodity fleet management problems , 1997 .

[42]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[43]  Daniel Adelman,et al.  A Price-Directed Approach to Stochastic Inventory/Routing , 2004, Oper. Res..