Hill Climbing on Value Estimates for Search-control in Dyna
暂无分享,去创建一个
Martha White | Hengshuai Yao | Amir-massoud Farahmand | Yangchen Pan | Hengshuai Yao | Amir-massoud Farahmand | Martha White | Yangchen Pan
[1] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[2] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .
[3] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.
[4] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.
[5] C. Hwang,et al. Diffusion for global optimization in R n , 1987 .
[6] Hiroshi Nakagawa,et al. Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process , 2014, ICML.
[7] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[8] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[9] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[10] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[11] Euclid,et al. The annals of applied probability : an official journal of the Institute of Mathematical Statistics. , 1991 .
[12] Bruno Castro da Silva,et al. Energetic Natural Gradient Descent , 2016, ICML.
[13] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[14] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[15] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[16] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[17] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[18] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[19] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[20] H. JoséAntonioMartín,et al. Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems , 2011, Knowl. Based Syst..
[21] Martha White,et al. Organizing Experience: a Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains , 2018, IJCAI.
[22] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[23] Rich Sutton,et al. A Deeper Look at Planning as Learning from Replay , 2015, ICML.
[24] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[25] M. V. Rossum,et al. In Neural Computation , 2022 .
[26] Shun-ichi Amari,et al. Why natural gradient? , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[27] Robert Babuska,et al. Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[28] Erik Talvitie,et al. The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces , 2018, ArXiv.
[29] Wulfram Gerstner,et al. Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation , 2018, ICML.
[30] Kam-Fai Wong,et al. Integrating planning for task-completion dialogue policy learning , 2018, ACL.
[31] D. Signorini,et al. Neural networks , 1995, The Lancet.
[32] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[33] V. Kaul,et al. Planning , 2012 .
[34] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[35] Seth Hutchinson,et al. An integrated architecture for learning and planning in robotic domains , 1991, SGAR.
[36] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[37] É. Moulines,et al. Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.