cient Planning in MDPs by Small Backups
暂无分享,去创建一个
[1] R. Bellman. A Markovian Decision Process , 1957 .
[2] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .
[3] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[4] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.
[5] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[6] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[7] David Andre,et al. Generalized Prioritized Sweeping , 1997, NIPS.
[8] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .
[9] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[10] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Geoffrey J. Gordon,et al. Fast Exact Planning in Markov Decision Processes , 2005, ICAPS.
[13] Jesse Hoey,et al. Efficient planning in R-max , 2011, AAMAS.
[14] Shimon Whiteson,et al. Exploiting Best-Match Equations for Efficient Reinforcement Learning , 2011, J. Mach. Learn. Res..
[15] Shimon Whiteson,et al. V-MAX: tempered optimism for better PAC reinforcement learning , 2012, AAMAS.