A time aggregation approach to Markov decision processes

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[3]  John N. Tsitsiklis,et al.  Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..

[4]  Xi-Ren Cao,et al.  A unified approach to Markov decision problems and performance sensitivity analysis , 2000, at - Automatisierungstechnik.

[5]  John N. Tsitsiklis,et al.  Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[6]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7]  X. Cao,et al.  Single Sample Path-Based Optimization of Markov Chains , 1999 .

[8]  Xi-Ren Cao,et al.  Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..

[9]  Xi-Ren Cao,et al.  The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes , 1998, Discret. Event Dyn. Syst..

[10]  R. Sutton Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales , 1998 .

[11]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[12]  Benjamin Van Roy,et al.  A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[13]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[14]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15]  Robert G. Gallager,et al.  Discrete Stochastic Processes , 1995 .

[16]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[17]  John Rust Using Randomization to Break the Curse of Dimensionality , 1997 .

[18]  Attila Csenki,et al.  Dependability for Systems with a Partitioned State Space: Markov and Semi-Markov Theory and Computational Implementation , 1994 .

[19]  Xi-Ren Cao,et al.  Realization Probabilities: The Dynamics of Queuing Systems , 1994 .

[20]  Y. Ho,et al.  Performance gradient estimation for the very large finite Markov chains , 1991 .

[21]  H. Khalil,et al.  Aggregation of the policy iteration method for nearly completely decomposable Markov chains , 1991 .

[22]  P. Varaiya,et al.  Multilayer control of large Markov chains , 1978 .

[23]  D. Vere-Jones Markov Chains , 1972, Nature.

[24]  S. Bhatnagar The Indian Institute of Science, Bangalore , 1924, Nature.