暂无分享,去创建一个
Pieter Abbeel | Igor Mordatch | Ilya Sutskever | Maruan Al-Shedivat | Trapit Bansal | Yuri Burda | P. Abbeel | Ilya Sutskever | Igor Mordatch | Yuri Burda | Maruan Al-Shedivat | Trapit Bansal
[1] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[2] Qiang Yang,et al. Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.
[3] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[4] Richard S. Sutton,et al. On the role of tracking in stationary environments , 2007, ICML '07.
[5] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[6] J. Urgen Schmidhuber. Learning to Control Fast-weight Memories: an Alternative to Dynamic Recurrent Networks , 1991 .
[7] Sebastian Thrun,et al. Learning to Learn , 1998, Springer US.
[8] Victor R. Lesser,et al. Multi-Agent Learning with Policy Prediction , 2010, AAAI.
[9] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.
[10] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[11] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[12] Paulo Martins Engel,et al. Dealing with non-stationary environments using context detection , 2006, ICML.
[13] Jitendra Malik,et al. Learning to Optimize , 2016, ICLR.
[14] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.
[15] Tom Minka,et al. TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.
[16] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[17] Amos J. Storkey,et al. Towards a Neural Statistician , 2016, ICLR.
[18] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[19] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.
[20] Mark B. Ring. CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.
[21] Hod Lipson,et al. Resilient Machines Through Continuous Self-Modeling , 2006, Science.
[22] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[23] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[24] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.
[25] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.
[26] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.
[27] Sebastian Thrun,et al. Lifelong Learning Algorithms , 1998, Learning to Learn.
[28] Eric P. Xing,et al. Contextual Explanation Networks , 2017, J. Mach. Learn. Res..
[29] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.
[30] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.
[31] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.
[32] Wenwu Yu,et al. An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination , 2012, IEEE Transactions on Industrial Informatics.
[33] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[34] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[35] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.
[36] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[37] Jonathan L. Shapiro,et al. Opponent Modelling by Sequence Prediction and Lookahead in Two-Player Games , 2013, ICAISC.
[38] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[39] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.
[40] Bartunov Sergey,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016 .
[41] Xinlei Chen,et al. Never-Ending Learning , 2012, ECAI.
[42] Yoshua Bengio,et al. On the Optimization of a Synaptic Learning Rule , 2007 .
[43] Jun Wang,et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.
[44] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.
[45] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[46] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[47] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[48] Peng Peng,et al. Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.
[49] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[50] Shimon Whiteson,et al. DiCE: The Infinitely Differentiable Monte-Carlo Estimator , 2018, ICML.
[51] Antoine Cully,et al. Robots that can adapt like animals , 2014, Nature.
[52] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[53] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[54] James Davidson,et al. Supervision via competition: Robot adversaries for learning tasks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).