Using Control Theory for Analysis of Reinforcement Learning and Optimal Policy Properties in Grid-World Problems
暂无分享,去创建一个
Naser Pariz | Mohammad-Bagher Naghibi-Sistani | S. Mostapha Kalami Heris | N. Pariz | M. Naghibi-Sistani | S. M. K. Heris
[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[3] M. Uschold,et al. Methods and applications , 1953 .
[4] Robert Givan,et al. Bounded-parameter Markov decision processes , 2000, Artif. Intell..
[5] Stuart I. Reynolds. Reinforcement Learning with Exploration , 2002 .
[6] Katsuhiko Ogata,et al. Discrete-time control systems (2nd ed.) , 1995 .
[7] Q. Hu,et al. Markov decision processes with their applications , 2007 .
[8] Daniela Pucci de Farias,et al. Approximate value iteration and temporal-difference learning , 2000 .
[9] Manuela Veloso,et al. Probabilistic Reuse of Past Policies , 2005 .
[10] Benjamin Van Roy. Neuro-Dynamic Programming: Overview and Recent Trends , 2002 .
[11] Geoffrey E. Hinton,et al. Reinforcement learning for factored Markov decision processes , 2002 .
[12] Art Lew,et al. Dynamic Programming: A Computational Tool , 2006 .
[13] James E. Smith,et al. Structural Properties of Stochastic Dynamic Programs , 2002, Oper. Res..
[14] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[15] Manuela Veloso,et al. Building a Library of Policies through Policy Reuse , 2005 .
[16] Steven I. Marcus,et al. A survey of some simulation-based algorithms for Markov decision processes , 2007, Commun. Inf. Syst..
[17] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[18] Larry D. Pyeatt. Integration of Partially Observable Markov Decision Processes and Reinforcement Learning for Simulat , 1999 .
[19] Benjamin Van Roy,et al. On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .
[20] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[21] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.
[22] Manuela Veloso,et al. Exploration and Policy Reuse , 2005 .
[23] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[24] Eric A. Hansen,et al. An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.
[25] Katsuhiko Ogata,et al. Discrete-time control systems , 1987 .
[26] Weihong Zhang,et al. Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..
[27] Michael C. Fu,et al. Monotone Optimal Policies for a Transient Queueing Staffing Problem , 2000, Oper. Res..
[28] Jiaqiao Hu,et al. Simulation-based Algorithms for Markov Decision Processes (Communications and Control Engineering) , 2007 .
[29] Daniel S. Bernstein,et al. Reusing Old Policies to Accelerate Learning on New MDPs , 1999 .
[30] A. Shwartz,et al. Handbook of Markov decision processes : methods and applications , 2002 .
[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.