暂无分享,去创建一个
[1] Jonas Buchli,et al. Learning of closed-loop motion control , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[2] David Warde-Farley,et al. Fast Task Inference with Variational Intrinsic Successor Features , 2019, ICLR.
[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[4] Erik Talvitie,et al. Model Regularization for Stable Sample Rollouts , 2014, UAI.
[5] Yoshua Bengio,et al. Universal Successor Representations for Transfer Reinforcement Learning , 2018, ICLR.
[6] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[7] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[8] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[9] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..
[10] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.
[11] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[12] Martin A. Riedmiller,et al. Approximate model-assisted Neural Fitted Q-Iteration , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).
[13] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.
[14] Sergey Levine,et al. Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[15] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[16] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[17] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[18] Iain Murray,et al. Neural Spline Flows , 2019, NeurIPS.
[19] B. Widrow,et al. Neural networks for self-learning control systems , 1990, IEEE Control Systems Magazine.
[20] Tom Schaul,et al. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement , 2018, ICML.
[21] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.
[22] Samuel J Gershman,et al. The Successor Representation: Its Computational Logic and Neural Substrates , 2018, The Journal of Neuroscience.
[23] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[24] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[25] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[26] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[27] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[28] Kavosh Asadi,et al. Combating the Compounding-Error Problem with a Multi-step Model , 2019, ArXiv.
[29] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.
[30] Peter Dayan,et al. Structure in the Space of Value Functions , 2002, Machine Learning.
[31] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.
[32] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[33] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[34] Raymond Y. K. Lau,et al. Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[35] Richard S. Sutton,et al. Sample-based learning and search with permanent and transient memories , 2008, ICML '08.
[36] M. Botvinick,et al. The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.
[37] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[38] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[39] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[40] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[41] Erik Talvitie,et al. Self-Correcting Models for Model-Based Reinforcement Learning , 2016, AAAI.
[42] Samuel Gershman,et al. Deep Successor Reinforcement Learning , 2016, ArXiv.
[43] Byron Boots,et al. Differentiable MPC for End-to-end Planning and Control , 2018, NeurIPS.
[44] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[45] Rob Fergus,et al. Understanding the Asymptotic Performance of Model-Based RL Methods , 2018 .
[46] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.
[47] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[48] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[49] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[50] Gabriel Kalweit,et al. Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning , 2017, CoRL.