Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction
暂无分享,去创建一个
Jianye Hao | Zhaopeng Meng | Chen Chen | Guangyong Chen | Hongyao Tang | Yaodong Yang | Luo Zhang | Pengfei Chen | Wulong Liu | Yaodong Yang | Wulong Liu | Jianye Hao | Chen Chen | Lu Zhang | Pengfei Chen | Hongyao Tang | Zhaopeng Meng | Guangyong Chen
[1] Rowan McAllister,et al. Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.
[2] D. Schacter,et al. Remembering the past to imagine the future: the prospective brain , 2007, Nature Reviews Neuroscience.
[3] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[4] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[5] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[6] Thomas Brox,et al. TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning , 2018, ICLR.
[7] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[8] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[9] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[10] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[11] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[12] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[13] D. Schacter,et al. The cognitive neuroscience of constructive memory: remembering the past and imagining the future , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.
[14] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[15] Filipe Wall Mutz,et al. Training Agents using Upside-Down Reinforcement Learning , 2019, ArXiv.
[16] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[17] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[18] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[19] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[20] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[21] Samuel Gershman,et al. Deep Successor Reinforcement Learning , 2016, ArXiv.
[22] Martha White,et al. Maxmin Q-learning: Controlling the Estimation Bias of Q-learning , 2020, ICLR.
[23] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[24] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[25] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[27] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[28] Lei Xu,et al. Input Convex Neural Networks : Supplementary Material , 2017 .
[29] Jianye Hao,et al. Towards Effective Context for Meta-Reinforcement Learning: an Approach based on Contrastive Learning , 2020, AAAI.
[30] John S. Schreck,et al. Learning Retrosynthetic Planning through Simulated Experience , 2019, ACS central science.
[31] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.
[32] Tom Schaul,et al. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement , 2018, ICML.
[33] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[34] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[35] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[36] Yoshua Bengio,et al. Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future , 2019, ArXiv.
[37] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[38] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[39] Yuanyuan Shi,et al. Optimal Control Via Neural Networks: A Convex Approach , 2018, ICLR.
[40] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[41] Vladlen Koltun,et al. Learning to Act by Predicting the Future , 2016, ICLR.
[42] Guillaume Desjardins,et al. Understanding disentangling in β-VAE , 2018, ArXiv.
[43] R Devon Hjelm,et al. Data-Efficient Reinforcement Learning with Momentum Predictive Representations , 2020, ArXiv.
[44] C. Atance,et al. Episodic future thinking , 2001, Trends in Cognitive Sciences.
[45] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[46] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[47] Pieter Abbeel,et al. CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.