γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction
暂无分享,去创建一个
[1] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[2] David Warde-Farley,et al. Fast Task Inference with Variational Intrinsic Successor Features , 2019, ICLR.
[3] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[4] Iain Murray,et al. Neural Spline Flows , 2019, NeurIPS.
[5] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[6] Kavosh Asadi,et al. Combating the Compounding-Error Problem with a Multi-step Model , 2019, ArXiv.
[7] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.
[8] Byron Boots,et al. Differentiable MPC for End-to-end Planning and Control , 2018, NeurIPS.
[9] Rob Fergus,et al. Understanding the Asymptotic Performance of Model-Based RL Methods , 2018 .
[10] Samuel J Gershman,et al. The Successor Representation: Its Computational Logic and Neural Substrates , 2018, The Journal of Neuroscience.
[11] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[12] Tom Schaul,et al. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement , 2018, ICML.
[13] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[14] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[15] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.
[16] Yoshua Bengio,et al. Universal Successor Representations for Transfer Reinforcement Learning , 2018, ICLR.
[17] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[18] Sergey Levine,et al. Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[19] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[20] Gabriel Kalweit,et al. Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.
[21] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[22] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[23] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[24] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[25] Erik Talvitie,et al. Self-Correcting Models for Model-Based Reinforcement Learning , 2016, AAAI.
[26] Raymond Y. K. Lau,et al. Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] M. Botvinick,et al. The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.
[28] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[29] Samuel Gershman,et al. Deep Successor Reinforcement Learning , 2016, ArXiv.
[30] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.
[31] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[32] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[33] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[34] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[35] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.
[36] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[37] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[38] Jonas Buchli,et al. Learning of closed-loop motion control , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[39] Martin A. Riedmiller,et al. Approximate model-assisted Neural Fitted Q-Iteration , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).
[40] Erik Talvitie,et al. Model Regularization for Stable Sample Rollouts , 2014, UAI.
[41] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[42] Richard S. Sutton,et al. Sample-based learning and search with permanent and transient memories , 2008, ICML '08.
[43] Peter Dayan,et al. Structure in the Space of Value Functions , 2002, Machine Learning.
[44] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[45] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[46] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[47] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[48] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.
[49] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..
[50] B. Widrow,et al. Neural networks for self-learning control systems , 1990, IEEE Control Systems Magazine.
[51] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[52] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .