A time aggregation approach to Markov decision processes
暂无分享,去创建一个
Zhiyuan Ren | Michael C. Fu | Shalabh Bhatnagar | Xi-Ren Cao | Steven I. Marcus | Xi-Ren Cao | S. Bhatnagar | S. Marcus | M. Fu | Zhiyuan Ren
[1] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[2] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[3] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[4] Xi-Ren Cao,et al. A unified approach to Markov decision problems and performance sensitivity analysis , 2000, at - Automatisierungstechnik.
[5] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[6] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[7] X. Cao,et al. Single Sample Path-Based Optimization of Markov Chains , 1999 .
[8] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..
[9] Xi-Ren Cao,et al. The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes , 1998, Discret. Event Dyn. Syst..
[10] R. Sutton. Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales , 1998 .
[11] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[12] Benjamin Van Roy,et al. A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[13] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[14] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[15] Robert G. Gallager,et al. Discrete Stochastic Processes , 1995 .
[16] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[17] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .
[18] Attila Csenki,et al. Dependability for Systems with a Partitioned State Space: Markov and Semi-Markov Theory and Computational Implementation , 1994 .
[19] Xi-Ren Cao,et al. Realization Probabilities: The Dynamics of Queuing Systems , 1994 .
[20] Y. Ho,et al. Performance gradient estimation for the very large finite Markov chains , 1991 .
[21] H. Khalil,et al. Aggregation of the policy iteration method for nearly completely decomposable Markov chains , 1991 .
[22] P. Varaiya,et al. Multilayer control of large Markov chains , 1978 .
[23] D. Vere-Jones. Markov Chains , 1972, Nature.
[24] S. Bhatnagar. The Indian Institute of Science, Bangalore , 1924, Nature.