论文信息 - A time aggregation approach to Markov decision processes - 字舞流文

A time aggregation approach to Markov decision processes

Zhiyuan Ren | Michael C. Fu | Shalabh Bhatnagar | Xi-Ren Cao | Steven I. Marcus | Xi-Ren Cao | S. Bhatnagar | S. Marcus | M. Fu | Zhiyuan Ren

[1] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[2] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[3] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..

[4] Xi-Ren Cao,et al. A unified approach to Markov decision problems and performance sensitivity analysis , 2000, at - Automatisierungstechnik.

[5] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[6] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7] X. Cao,et al. Single Sample Path-Based Optimization of Markov Chains , 1999 .

[8] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..

[9] Xi-Ren Cao,et al. The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes , 1998, Discret. Event Dyn. Syst..

[10] R. Sutton. Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales , 1998 .

[11] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[12] Benjamin Van Roy,et al. A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[13] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[14] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15] Robert G. Gallager,et al. Discrete Stochastic Processes , 1995 .

[16] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[17] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .

[18] Attila Csenki,et al. Dependability for Systems with a Partitioned State Space: Markov and Semi-Markov Theory and Computational Implementation , 1994 .

[19] Xi-Ren Cao,et al. Realization Probabilities: The Dynamics of Queuing Systems , 1994 .

[20] Y. Ho,et al. Performance gradient estimation for the very large finite Markov chains , 1991 .

[21] H. Khalil,et al. Aggregation of the policy iteration method for nearly completely decomposable Markov chains , 1991 .

[22] P. Varaiya,et al. Multilayer control of large Markov chains , 1978 .

[23] D. Vere-Jones. Markov Chains , 1972, Nature.

[24] S. Bhatnagar. The Indian Institute of Science, Bangalore , 1924, Nature.