Event-Based Optimization of Markov Systems

Recent research indicates that Markov decision processes (MDPs) and perturbation analysis (PA) based optimization can be derived easily from two fundamental performance sensitivity formulas. With this sensitivity point of view, an event-based optimization approach, including event-based sensitivity analysis and event-based policy iteration, was proposed via an example by X. R. Cao (Discrete Event Dyn. Syst.: Theory Appl., vol. 15, pp. 169-197, 2005). This approach utilizes the special feature of a system and illustrates how the potentials can be aggregated using the special feature. The approach applies to many practical problems that do not fit well the standard MDP formulation. This note provides a mathematical formulation and proves the main results for this approach.

[1]  Erhan Çinlar,et al.  Introduction to stochastic processes , 1974 .

[2]  J. Meyer The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains , 1975 .

[3]  J. Brewer The derivative of the exponential matrix with respect to a matrix , 1977 .

[4]  Xi-Ren Cao,et al.  Perturbation analysis and optimization of queueing networks , 1983 .

[5]  Xi-Ren Cao Convergence of parameter sensitivity estimates in a stochastic experiment , 1984, The 23rd IEEE Conference on Decision and Control.

[6]  Anuradha M. Annaswamy,et al.  Stable Adaptive Systems , 1989 .

[7]  Karl Johan Åström,et al.  Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[8]  M. Fu Convergence of a stochastic approximation algorithm for the GI/G/1 queue using infinitesimal perturbation analysis , 1990 .

[9]  Xi-Ren Cao,et al.  Perturbation analysis of discrete event dynamic systems , 1991 .

[10]  E. Chong,et al.  Optimization of queues using an infinitesimal perturbation analysis-based stochastic algorithm with general update times , 1993 .

[11]  Satinder P. Singh,et al.  Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[12]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13]  E. Chong,et al.  Stochastic optimization of regenerative systems using infinitesimal perturbation analysis , 1994, IEEE Trans. Autom. Control..

[14]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[15]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[16]  T. Söderström,et al.  Least squares parameter estimation of continuous-time ARX models from discrete-time data , 1997, IEEE Trans. Autom. Control..

[17]  Xi-Ren Cao,et al.  Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..

[18]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[19]  Jeffrey C. Lagarias,et al.  Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions , 1998, SIAM J. Optim..

[20]  Felisa J. Vázquez-Abad,et al.  Centralized and decentralized asynchronous optimization of stochastic discrete-event systems , 1998 .

[21]  Xi-Ren Cao,et al.  The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes , 1998, Discret. Event Dyn. Syst..

[22]  John N. Tsitsiklis,et al.  Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[23]  Christos G. Cassandras,et al.  Introduction to Discrete Event Systems , 1999, The Kluwer International Series on Discrete Event Dynamic Systems.

[24]  Vivek S. Borkar,et al.  Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[25]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[26]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[27]  John N. Tsitsiklis,et al.  Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..

[28]  Peter L. Bartlett,et al.  Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[29]  Zhiyuan Ren,et al.  A time aggregation approach to Markov decision processes , 2002, Autom..

[30]  John N. Tsitsiklis,et al.  Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..

[31]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[32]  Vijay R. Konda,et al.  OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[33]  William L. Cooper,et al.  CONVERGENCE OF SIMULATION-BASED POLICY ITERATION , 2003, Probability in the Engineering and Informational Sciences.

[34]  Xi-Ren Cao Introduction to the Special Issue on Learning, Optimization, and Decision Making in DEDS , 2003, Discret. Event Dyn. Syst..

[35]  Haitao Fang,et al.  Potential-based online policy iteration algorithms for Markov decision processes , 2004, IEEE Trans. Autom. Control..

[36]  Erik G. Larsson,et al.  The CRB for parameter estimation in irregularly sampled continuous-time ARMA systems , 2003, IEEE Signal Processing Letters.

[37]  Xi-Ren Cao,et al.  Basic Ideas for Event-Based Optimization of Markov Systems , 2005, Discret. Event Dyn. Syst..

[38]  M. Mossberg Identification of continuous-time ARX models using sample cross-covariances , 2005, Proceedings of the 2005, American Control Conference, 2005..

[39]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[41]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[42]  Torsten Söderström,et al.  Identification of Continuous-Time ARX Models From Irregularly Sampled Data , 2007, IEEE Transactions on Automatic Control.

[43]  T. Söderström,et al.  Estimation of Continuous-time Stochastic System Parameters , 2008 .

[44]  Xi-Ren Cao,et al.  The $n$th-Order Bias Optimality for Multichain Markov Decision Processes , 2008, IEEE Transactions on Automatic Control.

[45]  L. Breuer Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.