论文信息 - The Linear Programming Approach to Approximate Dynamic Programming

The Linear Programming Approach to Approximate Dynamic Programming

The curse of dimensionality gives rise to prohibitive computational requirements that render infeasible the exact solution of large-scale stochastic control problems. We study an efficient method based on linear programming for approximating solutions to such problems. The approach "fits" a linear combination of pre-selected basis functions to the dynamic programming cost-to-go function. We develop error bounds that offer performance guarantees and also guide the selection of both basis functions and "state-relevance weights" that influence quality of the approximation. Experimental results in the domain of queueing network control provide empirical support for the methodology.

Benjamin Van Roy | Daniela Pucci de Farias | D. D. Farias | D. P. D. Farias

[1] R. Bellman,et al. FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .

[2] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .

[3] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .

[4] Rutherford Aris,et al. Discrete Dynamic Programming , 1965, The Mathematical Gazette.

[5] D. Luenberger. Optimization by Vector Space Methods , 1968 .

[6] A. F. Veinott. Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[7] E. Denardo. On Linear Programming in a Markov Decision Problem , 1970 .

[8] R. Dudley. Central Limit Theorems for Empirical Measures , 1978 .

[9] A. Hordijk,et al. Linear Programming and Markov Decision Chains , 1979 .

[10] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .

[11] V. Borkar. A convex analytic approach to Markov decision processes , 1988 .

[12] David Haussler,et al. Equivalence of models for polynomial learnability , 1988, COLT '88.

[13] P. R. Kumar,et al. Dynamic instabilities and stabilization methods in distributed real-time scheduling of manufacturing systems , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[14] R. Durrett. Probability: Theory and Examples , 1993 .

[15] Martin Grötschel,et al. Solution of large-scale symmetric travelling salesman problems , 1991, Math. Program..

[16] A. Michael,et al. A Linear Programming Approach toSolving Stochastic Dynamic Programs , 1993 .

[17] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[18] Sean P. Meyn,et al. Duality and linear programs for stability and performance analysis of queueing networks and scheduling policies , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[19] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[20] Kenneth L. Clarkson,et al. Las Vegas algorithms for linear and integer programming when the dimension is small , 1995, JACM.

[21] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[22] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[23] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[24] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[25] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[26] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[27] Mathukumalli Vidyasagar,et al. A Theory of Learning and Generalization , 1997 .

[28] Stanley E. Zin,et al. SPLINE APPROXIMATIONS TO VALUE FUNCTIONS: Linear Programming Approach , 1997 .

[29] P. Marbach. Simulation-Based Methods for Markov Decision Processes , 1998 .

[30] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .

[31] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[32] Christine A. Shoemaker,et al. Applying Experimental Design and Regression Splines to High-Dimensional Continuous-State Stochastic Dynamic Programming , 1999, Oper. Res..

[33] J. R. Morrison,et al. New Linear Program Performance Bounds for Queueing Networks , 1999 .

[34] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .

[35] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[36] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[37] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[38] Sean P. Meyn,et al. Value iteration and optimization of multiclass queueing networks , 1999, Queueing Syst. Theory Appl..

[39] On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning , 2000 .

[40] John N. Tsitsiklis,et al. Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.

[41] John N. Tsitsiklis,et al. Congestion-dependent pricing of network services , 2000, TNET.

[42] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[43] John N. Tsitsiklis,et al. Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.

[44] Francis A. Longstaff,et al. Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .

[45] A. W. van der Vaart,et al. Uniform Central Limit Theorems , 2001 .

[46] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..

[47] Sean P. Meyn. Sequencing and Routing in Multiclass Queueing Networks Part I: Feedback Regulation , 2001, SIAM J. Control. Optim..

[48] Dale Schuurmans,et al. Direct value-approximation for factored MDPs , 2001, NIPS.

[49] J. Tsitsiklis,et al. Performance of Multiclass Markovian Queueing Networks Via Piecewise Linear Lyapunov Functions , 2001 .

[50] Mark S. Squillante,et al. On maximizing service-level-agreement profits , 2001, PERV.

[51] Mark S. Squillante,et al. On maximizing service-level-agreement profits , 2001, EC.

[52] Benjamin Van Roy. Neuro-Dynamic Programming: Overview and Recent Trends , 2002 .

[53] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[54] Sean P. Meyn. Sequencing and Routing in Multiclass Queueing Networks Part II: Workload Relaxations , 2003, SIAM J. Control. Optim..

[55] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.

[56] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[57] Benjamin Van Roy,et al. On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[58] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[59] Giuseppe Carlo Calafiore,et al. Uncertain convex programs: randomized solutions and confidence levels , 2005, Math. Program..

[60] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[61] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.