Online policy iteration algorithm for semi-Markov switching state-space control processes

An event-based online policy iteration algorithm is presented for addressing hierarchical optimization problems. First, an event-driven analytical model with dynamic hierarchy called semi-Markov switching state-space control processes is introduced. Then, by exploiting the structure of dynamic hierarchy and the features of event-driven policy, an online adaptive optimization algorithm that combines potentials estimation and policy iteration is proposed. The convergence of this algorithm is also proved. Finally, as an illustrative example, the dynamic service composition in a service overlay network is formulated and addressed. Simulation results demonstrate the effectiveness of the presented algorithm.

[1]  Xi-Ren Cao,et al.  Stochastic learning and optimization - A sensitivity-based approach , 2007, Annual Reviews in Control.

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  Xi-Ren Cao,et al.  The potential structure of sample paths and performance sensitivities of Markov systems , 2004, IEEE Transactions on Automatic Control.

[4]  Xi-Ren Cao,et al.  Semi-Markov decision problems and performance sensitivity analysis , 2003, IEEE Trans. Autom. Control..

[5]  William L. Cooper,et al.  CONVERGENCE OF SIMULATION-BASED POLICY ITERATION , 2003, Probability in the Engineering and Informational Sciences.

[6]  Xi-Ren Cao,et al.  Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..

[7]  Abhijit Gosavi,et al.  A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis , 2004, Machine Learning.

[8]  Xi-Ren Cao,et al.  Basic Ideas for Event-Based Optimization of Markov Systems , 2005, Discret. Event Dyn. Syst..

[9]  Zhiyuan Ren,et al.  Switching control in multi-mode Markov decision processes , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[10]  P. Varaiya,et al.  Multilayer control of large Markov chains , 1978 .

[11]  Haitao Fang,et al.  Potential-based online policy iteration algorithms for Markov decision processes , 2004, IEEE Trans. Autom. Control..

[12]  Klara Nahrstedt,et al.  Distributed multimedia service composition with statistical QoS assurances , 2006, IEEE Transactions on Multimedia.

[13]  Zhiyuan Ren,et al.  A time aggregation approach to Markov decision processes , 2002, Autom..

[14]  G.-P. Dai,et al.  Performance optimization algorithms based on potentials for semi-Markov control processes , 2005 .