论文信息 - Single sample path based optimization of Markov systems: examples and algorithms

Single sample path based optimization of Markov systems: examples and algorithms

Motivated by the needs of online optimization of real world engineering systems, we study the single sample path based algorithms for Markov decision problems (MDPs). We give a simple example to explain the advantages of the sample path based approach over the traditional computation based approach: matrix inversion is not required; some transition probabilities do not have to be known; it may save storage space; and it gives the flexibility of iterating the actions for a subset of the state space in each iteration. The effect of the estimation errors and the convergence property of the sample path based approach are studied. Finally, we propose a "fast" algorithm which updates the policy whenever the system reaches a particular set of states; the algorithm converges to the true optimal policy with probability one under some conditions.

Xi-Ren Cao | Xi-Ren Cao

[1] Rajan Suri,et al. Single Run Optimization of Discrete Event Simulations—An Empirical Study Using the M/M/l Queue , 1989 .

[2] M. K. Ghosh,et al. Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[3] E. Chong,et al. Stochastic optimization of regenerative systems using infinitesimal perturbation analysis , 1994, IEEE Trans. Autom. Control..

[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[5] Stephen M. Robinson,et al. Sample-path optimization of convex stochastic performance functions , 1996, Math. Program..

[6] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..