暂无分享,去创建一个
[1] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[2] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[3] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[4] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[5] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..
[6] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..
[7] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[8] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[9] Gaurav S. Sukhatme,et al. Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.
[10] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[11] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[12] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.
[13] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[14] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[15] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[16] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[17] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[18] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[19] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[20] Arthur G. Richards,et al. Robust constrained model predictive control , 2005 .
[21] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[22] Sergey Levine,et al. SOLAR: Deep Structured Latent Representations for Model-Based Reinforcement Learning , 2018, ArXiv.
[23] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[24] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[25] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[26] Sergey Levine,et al. SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.
[27] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[28] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[29] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[30] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[31] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[32] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[33] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[34] Dirk P. Kroese,et al. The cross-entropy method for estimation , 2013 .
[35] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[36] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[37] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.
[38] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[39] Elad Hoffer,et al. Exponentially vanishing sub-optimal local minima in multilayer neural networks , 2017, ICLR.
[40] Dirk P. Kroese,et al. Chapter 3 – The Cross-Entropy Method for Optimization , 2013 .
[41] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[42] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[43] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.