Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
暂无分享,去创建一个
Honglak Lee | George Tucker | Danijar Hafner | Eugene Brevdo | Jacob Buckman | G. Tucker | Honglak Lee | Danijar Hafner | E. Brevdo | Jacob Buckman
[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[2] J. Fleiss. Review papers : The statistical basis of meta-analysis , 1993 .
[3] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[4] MODEL-ENSEMBLE TRUST-REGION POLICY OPTI- , 2017 .
[5] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[6] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[7] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[8] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[9] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[10] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[11] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[12] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[15] Gabriel Kalweit,et al. Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.
[16] Scott Niekum,et al. Policy Evaluation Using the Ω-Return , 2015, NIPS.
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .
[19] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[20] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[21] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[22] Finale Doshi-Velez,et al. Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.
[23] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[24] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[25] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[26] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[27] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[28] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[29] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[30] Sergey Levine,et al. MuProp: Unbiased Backpropagation for Stochastic Neural Networks , 2015, ICLR.
[31] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.