Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments

Ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence. In this paper, we cast the problem of continuous adaptation into the learning-to-learn framework. We develop a simple gradient-based meta-learning algorithm suitable for adaptation in dynamically changing and adversarial scenarios. Additionally, we design a new multi-agent competitive environment, RoboSumo, and define iterated adaptation games for testing various aspects of continuous adaptation strategies. We demonstrate that meta-learning enables significantly more efficient adaptation than reactive baselines in the few-shot regime. Our experiments with a population of agents that learn and compete suggest that meta-learners are the fittest.

[1]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[2]  Qiang Yang,et al.  Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[3]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[4]  Richard S. Sutton,et al.  On the role of tracking in stationary environments , 2007, ICML '07.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  J. Urgen Schmidhuber Learning to Control Fast-weight Memories: an Alternative to Dynamic Recurrent Networks , 1991 .

[7]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[8]  Victor R. Lesser,et al.  Multi-Agent Learning with Policy Prediction , 2010, AAAI.

[9]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[10]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[11]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[12]  Paulo Martins Engel,et al.  Dealing with non-stationary environments using context detection , 2006, ICML.

[13]  Jitendra Malik,et al.  Learning to Optimize , 2016, ICLR.

[14]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[15]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[16]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[17]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[18]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[19]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[20]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[21]  Hod Lipson,et al.  Resilient Machines Through Continuous Self-Modeling , 2006, Science.

[22]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[23]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[24]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[25]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[26]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[27]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[28]  Eric P. Xing,et al.  Contextual Explanation Networks , 2017, J. Mach. Learn. Res..

[29]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[30]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[31]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[32]  Wenwu Yu,et al.  An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination , 2012, IEEE Transactions on Industrial Informatics.

[33]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[34]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[35]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[36]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[37]  Jonathan L. Shapiro,et al.  Opponent Modelling by Sequence Prediction and Lookahead in Two-Player Games , 2013, ICAISC.

[38]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[39]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[40]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[41]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[42]  Yoshua Bengio,et al.  On the Optimization of a Synaptic Learning Rule , 2007 .

[43]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[44]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[45]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[46]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[47]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[48]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[49]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[50]  Shimon Whiteson,et al.  DiCE: The Infinitely Differentiable Monte-Carlo Estimator , 2018, ICML.

[51]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[52]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[53]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[54]  James Davidson,et al.  Supervision via competition: Robot adversaries for learning tasks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).