暂无分享,去创建一个
Alan Fern | Somdeb Majumdar | Anurag Koul | Varun V. Kumar | Alan Fern | Anurag Koul | Varun V. Kumar | Somdeb Majumdar
[1] Jürgen Schmidhuber,et al. World Models , 2018, ArXiv.
[2] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[3] Christopher D. Rosin,et al. Multi-armed bandits with episode context , 2011, Annals of Mathematics and Artificial Intelligence.
[4] Catholijn M. Jonker,et al. A0C: Alpha Zero in Continuous Action Space , 2018, ArXiv.
[5] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[6] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[7] Andriy Mnih,et al. Q-Learning in enormous action spaces via amortized approximate maximization , 2020, ArXiv.
[8] Sham M. Kakade,et al. Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.
[9] H. Jaap van den Herik,et al. Progressive Strategies for Monte-Carlo Tree Search , 2008 .
[10] Nataliya Sokolovska,et al. Continuous Upper Confidence Trees , 2011, LION.
[11] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[12] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Sergey Levine,et al. SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.
[15] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[16] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[17] Sergey Levine,et al. Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.
[18] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.
[19] Michael L. Littman,et al. Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.
[20] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[21] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[22] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[23] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[24] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[25] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[26] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[27] Aaron van den Oord,et al. Shaping Belief States with Generative Environment Models for RL , 2019, NeurIPS.