Event-Based Optimization of Markov Systems
暂无分享,去创建一个
[1] Erhan Çinlar,et al. Introduction to stochastic processes , 1974 .
[2] J. Meyer. The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains , 1975 .
[3] J. Brewer. The derivative of the exponential matrix with respect to a matrix , 1977 .
[4] Xi-Ren Cao,et al. Perturbation analysis and optimization of queueing networks , 1983 .
[5] Xi-Ren Cao. Convergence of parameter sensitivity estimates in a stochastic experiment , 1984, The 23rd IEEE Conference on Decision and Control.
[6] Anuradha M. Annaswamy,et al. Stable Adaptive Systems , 1989 .
[7] Karl Johan Åström,et al. Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.
[8] M. Fu. Convergence of a stochastic approximation algorithm for the GI/G/1 queue using infinitesimal perturbation analysis , 1990 .
[9] Xi-Ren Cao,et al. Perturbation analysis of discrete event dynamic systems , 1991 .
[10] E. Chong,et al. Optimization of queues using an infinitesimal perturbation analysis-based stochastic algorithm with general update times , 1993 .
[11] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[12] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[13] E. Chong,et al. Stochastic optimization of regenerative systems using infinitesimal perturbation analysis , 1994, IEEE Trans. Autom. Control..
[14] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[15] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[16] T. Söderström,et al. Least squares parameter estimation of continuous-time ARX models from discrete-time data , 1997, IEEE Trans. Autom. Control..
[17] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[18] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[19] Jeffrey C. Lagarias,et al. Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions , 1998, SIAM J. Optim..
[20] Felisa J. Vázquez-Abad,et al. Centralized and decentralized asynchronous optimization of stochastic discrete-event systems , 1998 .
[21] Xi-Ren Cao,et al. The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes , 1998, Discret. Event Dyn. Syst..
[22] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[23] Christos G. Cassandras,et al. Introduction to Discrete Event Systems , 1999, The Kluwer International Series on Discrete Event Dynamic Systems.
[24] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[25] Leslie Pack Kaelbling,et al. Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.
[26] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[27] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[28] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[29] Zhiyuan Ren,et al. A time aggregation approach to Markov decision processes , 2002, Autom..
[30] John N. Tsitsiklis,et al. Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..
[31] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[32] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[33] William L. Cooper,et al. CONVERGENCE OF SIMULATION-BASED POLICY ITERATION , 2003, Probability in the Engineering and Informational Sciences.
[34] Xi-Ren Cao. Introduction to the Special Issue on Learning, Optimization, and Decision Making in DEDS , 2003, Discret. Event Dyn. Syst..
[35] Haitao Fang,et al. Potential-based online policy iteration algorithms for Markov decision processes , 2004, IEEE Trans. Autom. Control..
[36] Erik G. Larsson,et al. The CRB for parameter estimation in irregularly sampled continuous-time ARMA systems , 2003, IEEE Signal Processing Letters.
[37] Xi-Ren Cao,et al. Basic Ideas for Event-Based Optimization of Markov Systems , 2005, Discret. Event Dyn. Syst..
[38] M. Mossberg. Identification of continuous-time ARX models using sample cross-covariances , 2005, Proceedings of the 2005, American Control Conference, 2005..
[39] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[40] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[41] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.
[42] Torsten Söderström,et al. Identification of Continuous-Time ARX Models From Irregularly Sampled Data , 2007, IEEE Transactions on Automatic Control.
[43] T. Söderström,et al. Estimation of Continuous-time Stochastic System Parameters , 2008 .
[44] Xi-Ren Cao,et al. The $n$th-Order Bias Optimality for Multichain Markov Decision Processes , 2008, IEEE Transactions on Automatic Control.
[45] L. Breuer. Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.