论文信息 - State partitioning based linear program for stochastic dynamic programs: An invariance property - 字舞流文

State partitioning based linear program for stochastic dynamic programs: An invariance property

Abstract A common approximate dynamic programming method entails state partitioning and the use of linear programming, i.e., the state-space is partitioned and the optimal value function is approximated by a constant over each partition. By minimizing a positive cost function defined on the partitions, one can construct an upper bound for the optimal value function. We show that this approximate value function is independent of the positive cost function and that it is the least upper bound, given the partitions.

Myoungkuk Park | Meir Pachter | Phillip R. Chandler | Swaroop Darbha | Kalyanam Krishnamoorthy | M. Pachter | S. Darbha | P. Chandler | Myoungkuk Park | K. Krishnamoorthy

[1] Sven Axsäter,et al. State aggregation in dynamic programming - An application to scheduling of independent jobs on parallel processors , 1983 .

[2] J. MacQueen. A MODIFIED DYNAMIC PROGRAMMING METHOD FOR MARKOVIAN DECISION PROBLEMS , 1966 .

[3] Martin Grötschel,et al. Solution of large-scale symmetric travelling salesman problems , 1991, Math. Program..

[4] Bożydar Ziółkowsk. Effectitions as a new instrument in sustainable development policy – the conceptual approach , 2012 .

[5] E. Denardo. On Linear Programming in a Markov Decision Problem , 1970 .

[6] Roy Mendelssohn. Technical Note - Improved Bounds for Aggregated Linear Programs , 1980, Oper. Res..

[7] Meir Pachter,et al. Optimization of Perimeter Patrol Operations Using Unmanned Aerial Vehicles , 2012 .

[8] Robert L. Smith,et al. Aggregation in Dynamic Programming , 1987, Oper. Res..

[9] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[10] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .

[11] Meir Pachter,et al. Bounding procedure for stochastic dynamic programs with application to the perimeter patrol problem , 2011, 2012 American Control Conference (ACC).

[12] Dale Schuurmans,et al. Direct value-approximation for factored MDPs , 2001, NIPS.

[13] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .

[14] Harvey J. Greenberg,et al. Surrogate Mathematical Programming , 1970, Oper. Res..

[15] Roy Mendelssohn,et al. An Iterative Aggregation Procedure for Markov Decision Processes , 1982, Oper. Res..

[16] A. Hordijk,et al. Linear Programming and Markov Decision Chains , 1979 .

[17] Martin Grötschel,et al. The ellipsoid method and its consequences in combinatorial optimization , 1981, Comb..

[18] S. Darbha,et al. Approximate dynamic programming with state aggregation applied to UAV perimeter patrol , 2011 .

[19] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[20] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .

[21] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..

[22] Fred W. Glover,et al. Surrogate Constraints , 1968, Oper. Res..

[23] Stanley E. Zin,et al. SPLINE APPROXIMATIONS TO VALUE FUNCTIONS: Linear Programming Approach , 1997 .

[24] Fred Glover,et al. Surrogate Constraint Duality in Mathematical Programming , 1975, Oper. Res..

[25] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .