Towards Continual Reinforcement Learning: A Review and Perspectives

In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations and mathematically characterize the non-stationary dynamics of each setting. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics.

[1]  P. Samuelson A Note on Measurement of Utility , 1937 .

[2]  P. Randolph Bayesian Decision Problems and Markov Chains , 1968 .

[3]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[4]  Harry Heft Affordances and the Body: An Intentional Analysis of Gibson's Ecological Approach to Visual Perception , 1989 .

[5]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[6]  Robert M. French,et al.  Using Semi-Distributed Representations to Overcome Catastrophic Forgetting in Connectionist Networks , 1991 .

[7]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[8]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[9]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[10]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[11]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[12]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[13]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[14]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[15]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[16]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[17]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[18]  Corso Elvezia A General Method for Incremental Self-improvement and Multi-agent Learning in Unrestricted Environments , 1996 .

[19]  Anthony V. Robins,et al.  Consolidation in Neural Networks and in the Sleeping Brain , 1996, Connect. Sci..

[20]  Juergen Schmidhuber,et al.  A General Method For Incremental Self-Improvement And Multi-Agent Learning In Unrestricted Environme , 1999 .

[21]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[22]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[23]  D. Wolpert,et al.  Internal models in the cerebellum , 1998, Trends in Cognitive Sciences.

[24]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[25]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[26]  Konkoly Thege Multi-criteria Reinforcement Learning , 1998 .

[27]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[28]  N. Whitman A bitter lesson. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[29]  Marcus Frean,et al.  Catastrophic forgetting in simple networks: an analysis of the pseudorehearsal solution. , 1999, Network.

[30]  Hiroshi Imamizu,et al.  Human cerebellar activity reflecting an acquired internal model of a new tool , 2000, Nature.

[31]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[32]  David S. Touretzky,et al.  Behavioral considerations suggest an average reward TD model of the dopamine system , 2000, Neurocomputing.

[33]  Dit-Yan Yeung,et al.  Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making , 2001, Sequence Learning.

[34]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[35]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[36]  A. Chemero An Outline of a Theory of Affordances , 2003, How Shall Affordances be Refined? Four Perspectives.

[37]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[38]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[39]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[40]  Jürgen Schmidhuber,et al.  Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[41]  Carol Rovane,et al.  What is an Agent? , 2004, Synthese.

[42]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[43]  Warren B. Powell,et al.  Reinforcement Learning and Its Relationship to Supervised Learning , 2004 .

[44]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[45]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[46]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[47]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[48]  Paulo Martins Engel,et al.  Dealing with non-stationary environments using context detection , 2006, ICML.

[49]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[50]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[51]  Yoshua Bengio,et al.  On the Optimization of a Synaptic Learning Rule , 2007 .

[52]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[53]  三嶋 博之 The theory of affordances , 2008 .

[54]  Jürgen Schmidhuber,et al.  Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes , 2008, ABiALS.

[55]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[56]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[57]  Y. Niv Reinforcement learning in the brain , 2009 .

[58]  N. Daw,et al.  Human Reinforcement Learning Subdivides Structured Action Spaces by Learning Effector-Specific Values , 2009, The Journal of Neuroscience.

[59]  Richard L. Lewis,et al.  Where Do Rewards Come From , 2009 .

[60]  Pierre-Yves Oudeyer,et al.  R-IAC: Robust Intrinsically Motivated Exploration and Active Learning , 2009, IEEE Transactions on Autonomous Mental Development.

[61]  Hui Li,et al.  Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..

[62]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[63]  David Hsu,et al.  Planning under Uncertainty for Robotic Tasks with Mixed Observability , 2010, Int. J. Robotics Res..

[64]  Jean-Marc Fellous,et al.  Computational models of reinforcement learning: the role of dopamine as a reward signal , 2010, Cognitive Neurodynamics.

[65]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[66]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[67]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[68]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[69]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[70]  Nathaniel D. Daw,et al.  Environmental statistics and the trade-off between model-based and TD learning in humans , 2011, NIPS.

[71]  Stephanie C. Y. Chan,et al.  On the value of information and other rewards , 2011, Nature Neuroscience.

[72]  Sean C. Duncan Minecraft, beyond construction and survival , 2011 .

[73]  M. Frank,et al.  Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRI. , 2012, Cerebral cortex.

[74]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[75]  M. Frank,et al.  Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. , 2012, Cerebral cortex.

[76]  Benjamin Rosman,et al.  A Multitask Representation Using Reusable Local Policy Templates , 2012, AAAI Spring Symposium: Designing Intelligent Robots.

[77]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[78]  Anne G E Collins,et al.  How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[79]  Natalie M. Trumpp,et al.  Embodiment theory and education: The foundations of cognition in perception and action , 2012, Trends in Neuroscience and Education.

[80]  N. Daw,et al.  Generalization of value in reinforcement learning by humans , 2012, The European journal of neuroscience.

[81]  E. Bizzi,et al.  A theory for how sensorimotor skills are learned and retained in noisy and nonstationary neural circuits , 2013, Proceedings of the National Academy of Sciences.

[82]  Matthew Botvinick,et al.  Divide and Conquer: Hierarchical Reinforcement Learning and Task Decomposition in Humans , 2013, Computational and Robotic Models of the Hierarchical Organization of Behavior.

[83]  S. Gershman,et al.  Moderate levels of activation lead to forgetting in the think/no-think paradigm , 2013, Neuropsychologia.

[84]  F. Oliehoek,et al.  Scalable Bayesian Reinforcement Learning for Multiagent POMDPs , 2013 .

[85]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[86]  M. Frank,et al.  Acute stress selectively reduces reward sensitivity , 2013, Front. Hum. Neurosci..

[87]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[88]  Ari Weinstein,et al.  Model-based hierarchical reinforcement learning and human action control , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[89]  Paul Weng,et al.  Solving Hidden-Semi-Markov-Mode Markov Decision Problems , 2014, SUM.

[90]  Emmanuel Hadoux,et al.  Sequential Decision-Making under Non-stationary Environments via Sequential Change-point Detection , 2014 .

[91]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[92]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[93]  Alec Solway,et al.  Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..

[94]  Jürgen Leitner,et al.  Curiosity driven reinforcement learning for motion planning on humanoids , 2014, Front. Neurorobot..

[95]  Naoyuki Kubota,et al.  Reinforcement Learning in non-stationary environments: An intrinsically motivated stress based memory retrieval performance (SBMRP) model , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[96]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[97]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[98]  Daniele Calandriello,et al.  Sparse multi-task reinforcement learning , 2014, Intelligenza Artificiale.

[99]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[100]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[101]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[102]  Y. Niv,et al.  Discovering latent causes in reinforcement learning , 2015, Current Opinion in Behavioral Sciences.

[103]  Robert C. Wilson,et al.  Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms , 2015, The Journal of Neuroscience.

[104]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[105]  Jianfeng Gao,et al.  Recurrent Reinforcement Learning: A Hybrid Approach , 2015, ArXiv.

[106]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[107]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[108]  Yusen Zhan,et al.  An exploration strategy for non-stationary opponents , 2016, Autonomous Agents and Multi-Agent Systems.

[109]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[110]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[111]  Stephen Clark,et al.  Virtual Embodiment: A Scalable Long-Term Strategy for Artificial Intelligence Research , 2016, ArXiv.

[112]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[113]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[114]  Finale Doshi-Velez,et al.  Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations , 2013, IJCAI.

[115]  Massimiliano Pontil,et al.  The Benefit of Multitask Representation Learning , 2015, J. Mach. Learn. Res..

[116]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[117]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[118]  John Shawe-Taylor,et al.  Learning Shared Representations in Multi-task Reinforcement Learning , 2016, ArXiv.

[119]  Shie Mannor,et al.  Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.

[120]  G. Pezzulo,et al.  Navigating the Affordance Landscape: Feedback Control as a Process Model of Behavior and Cognition , 2016, Trends in Cognitive Sciences.

[121]  Pieter Abbeel,et al.  Meta-Learning with Temporal Convolutions , 2017, ArXiv.

[122]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[123]  Siobhán Clarke,et al.  Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments , 2017, ACM Trans. Auton. Adapt. Syst..

[124]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[125]  S. Gershman,et al.  Dopamine reward prediction errors reflect hidden state inference across time , 2017, Nature Neuroscience.

[126]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[127]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[128]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[129]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[130]  Saurabh Kumar,et al.  Learning to Compose Skills , 2017, ArXiv.

[131]  J. Tenenbaum,et al.  Ingredients of intelligence: From classic debates to an engineering roadmap , 2017, Behavioral and Brain Sciences.

[132]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[133]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[134]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[135]  M. Littman,et al.  Toward Good Abstractions for Lifelong Learning , 2017 .

[136]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.

[137]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[138]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[139]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[140]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[141]  Nathaniel D. Daw,et al.  Self-Evaluation of Decision-Making: A General Bayesian Framework for Metacognitive Computation , 2017, Psychological review.

[142]  Li Zhang,et al.  Learning to Learn: Meta-Critic Networks for Sample Efficient Learning , 2017, ArXiv.

[143]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[144]  M. Riemer,et al.  Representation Stability as a Regularizer for Improved Text Analytics Transfer Learning , 2017, arXiv.org.

[145]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[146]  S. Gershman Reinforcement Learning and Causal Models , 2017 .

[147]  Trevor Darrell,et al.  Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.

[148]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[149]  Gregory Dudek,et al.  Benchmark Environments for Multitask Learning in Continuous Domains , 2017, ArXiv.

[150]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[151]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[152]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[153]  M. Botvinick,et al.  The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.

[154]  Michael R. Waldmann,et al.  The Oxford handbook of causal reasoning , 2017 .

[155]  Sebastian Risi,et al.  Automated Curriculum Learning by Rewarding Temporally Rare Events , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[156]  Richard Socher,et al.  Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning , 2017, ICLR.

[157]  Leslie Pack Kaelbling,et al.  Modular meta-learning , 2018, CoRL.

[158]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[159]  Y. Niv,et al.  Model-based predictions for dopamine , 2018, Current Opinion in Neurobiology.

[160]  David Silver,et al.  Meta-Gradient Reinforcement Learning , 2018, NeurIPS.

[161]  Balaraman Ravindran,et al.  Learning to Multi-Task by Active Sampling , 2017, ICLR.

[162]  Michael L. Littman,et al.  State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.

[163]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[164]  Satinder Singh,et al.  On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[165]  Satinder Singh,et al.  Many-Goals Reinforcement Learning , 2018, ArXiv.

[166]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[167]  Samy Bengio,et al.  A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.

[168]  Martha White,et al.  The Barbados 2018 List of Open Issues in Continual Learning , 2018, ArXiv.

[169]  Marcelo G Mattar,et al.  Prioritized memory access explains planning and hippocampal replay , 2017, Nature Neuroscience.

[170]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[171]  Murray Shanahan,et al.  Continual Reinforcement Learning with Complex Synapses , 2018, ICML.

[172]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[173]  Gerald Tesauro,et al.  Learning Abstract Options , 2018, NeurIPS.

[174]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[175]  Joel Z. Leibo,et al.  Prefrontal cortex as a meta-reinforcement learning system , 2018, bioRxiv.

[176]  Marlos C. Machado,et al.  Generalization and Regularization in DQN , 2018, ArXiv.

[177]  Ida Momennejad,et al.  Offline replay supports planning in human reinforcement learning , 2018, eLife.

[178]  Marlos C. Machado,et al.  Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.

[179]  Pieter Abbeel,et al.  Variational Option Discovery Algorithms , 2018, ArXiv.

[180]  Julian Togelius,et al.  Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation , 2018, 1806.10729.

[181]  Marcus Hutter,et al.  On Q-learning Convergence for Non-Markov Decision Processes , 2018, IJCAI.

[182]  J. Schulman,et al.  Reptile: a Scalable Metalearning Algorithm , 2018 .

[183]  Joelle Pineau,et al.  RE-EVALUATE: Reproducibility in Evaluating Reinforcement Learning Algorithms , 2018 .

[184]  Zhanxing Zhu,et al.  Reinforced Continual Learning , 2018, NeurIPS.

[185]  Song-Chun Zhu,et al.  Interactive Agent Modeling by Learning to Probe , 2018, ArXiv.

[186]  Glen Berseth,et al.  Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control , 2018, ICLR.

[187]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[188]  Qiang Liu,et al.  Learning to Explore with Meta-Policy Gradient , 2018, ICML 2018.

[189]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[190]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[191]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[192]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[193]  David Isele,et al.  Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[194]  Doina Precup,et al.  Environments for Lifelong Reinforcement Learning , 2018, ArXiv.

[195]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[196]  Doina Precup,et al.  When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.

[197]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[198]  John Schulman,et al.  Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[199]  Pieter Abbeel,et al.  Some Considerations on Learning to Explore via Meta-Reinforcement Learning , 2018, ICLR 2018.

[200]  Martha White,et al.  Organizing Experience: a Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains , 2018, IJCAI.

[201]  Elliot Meyerson,et al.  Evolutionary architecture search for deep multitask networks , 2018, GECCO.

[202]  Shimon Whiteson,et al.  DiCE: The Infinitely Differentiable Monte-Carlo Estimator , 2018, ICML.

[203]  Elliot Meyerson,et al.  Beyond Shared Hierarchies: Deep Multitask Learning through Soft Layer Ordering , 2017, ICLR.

[204]  Pieter Abbeel,et al.  Evolved Policy Gradients , 2018, NeurIPS.

[205]  Dawn Xiaodong Song,et al.  Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[206]  Tom Schaul,et al.  Unicorn: Continual Learning with a Universal, Off-policy Agent , 2018, ArXiv.

[207]  Theodore Lim,et al.  SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[208]  Jonathan D. Cohen,et al.  Efficiency of learning vs. processing: Towards a normative theory of multitasking , 2020, CogSci.

[209]  J. Schmidhuber Making the world differentiable: on using self supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environments , 1990, Forschungsberichte, TU Munich.

[210]  Joelle Pineau,et al.  Decoupling Dynamics and Reward for Transfer Learning , 2018, ICLR.

[211]  S. Gershman,et al.  Belief state representation in the dopamine system , 2018, Nature Communications.

[212]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[213]  Matthew Riemer,et al.  Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.

[214]  Djallel Bouneffouf,et al.  Scalable Recollections for Continual Lifelong Learning , 2017, AAAI.

[215]  David Filliat,et al.  DisCoRL: Continual Reinforcement Learning via Policy Distillation , 2019, ArXiv.

[216]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[217]  Neil D. Lawrence,et al.  Transferring Knowledge across Learning Processes , 2018, ICLR.

[218]  Yan Wu,et al.  Optimizing agent behavior over long time scales by transporting value , 2018, Nature Communications.

[219]  Kate Saenko,et al.  Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[220]  Filipe Wall Mutz,et al.  Hindsight policy gradients , 2017, ICLR.

[221]  Murray Shanahan,et al.  Policy Consolidation for Continual Reinforcement Learning , 2019, ICML.

[222]  Y. Niv Learning task-state representations , 2019, Nature Neuroscience.

[223]  Dong Yan,et al.  Reward Shaping via Meta-Learning , 2019, ArXiv.

[224]  Ramakanth Pasunuru,et al.  Continual and Multi-Task Architecture Search , 2019, ACL.

[225]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[226]  Sergey Levine,et al.  Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[227]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning , 2018, ICML.

[228]  Sergey Levine,et al.  Online Meta-Learning , 2019, ICML.

[229]  Christopher Potts,et al.  Recursive Routing Networks: Learning to Compose Modules for Language Understanding , 2019, NAACL.

[230]  Joelle Pineau,et al.  Online Learned Continual Compression with Stacked Quantization Module , 2019, ArXiv.

[231]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[232]  Runhao Zeng,et al.  Continual Reinforcement Learning with Diversity Exploration and Adversarial Self-Correction , 2019, ArXiv.

[233]  Martha White,et al.  Meta-Learning Representations for Continual Learning , 2019, NeurIPS.

[234]  Bruno A. Olshausen,et al.  Superposition of many models into one , 2019, NeurIPS.

[235]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[236]  Philip S. Thomas,et al.  Learning Action Representations for Reinforcement Learning , 2019, ICML.

[237]  G. Spigler Meta-learnt priors slow down catastrophic forgetting in neural networks , 2019, ArXiv.

[238]  Doina Precup,et al.  The Option Keyboard: Combining Skills in Reinforcement Learning , 2021, NeurIPS.

[239]  Danesh Shahnazian,et al.  Subgoal- and Goal-related Reward Prediction Errors in Medial Prefrontal Cortex , 2019, Journal of Cognitive Neuroscience.

[240]  Thomas L. Griffiths,et al.  Automatically Composing Representation Transformations as a Means for Generalization , 2018, ICLR.

[241]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[242]  Michael R. Meager,et al.  Hippocampal Contributions to Model-Based Planning and Spatial Memory , 2019, Neuron.

[243]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[244]  Nan Jiang,et al.  On Value Functions and the Agent-Environment Boundary , 2019, ArXiv.

[245]  Richard L. Lewis,et al.  Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.

[246]  Erwan Lecarpentier,et al.  Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning , 2019, NeurIPS.

[247]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[248]  S. Levine,et al.  Guided Meta-Policy Search , 2019, NeurIPS.

[249]  Wojciech Jaskowski,et al.  Model-Based Active Exploration , 2018, ICML.

[250]  Patrick van der Smagt,et al.  Unsupervised Real-Time Control Through Variational Empowerment , 2017, ISRR.

[251]  Yee Whye Teh,et al.  Exploiting Hierarchy for Learning and Transfer in KL-regularized RL , 2019, ArXiv.

[252]  Quoc V. Le,et al.  Diversity and Depth in Per-Example Routing Models , 2018, ICLR.

[253]  Matthias De Lange,et al.  Continual learning: A comparative study on how to defy forgetting in classification tasks , 2019, ArXiv.

[254]  Yoshua Bengio,et al.  Automated curriculum generation for Policy Gradients from Demonstrations , 2019, ArXiv.

[255]  Lei Cao,et al.  Learning to Learn: Hierarchical Meta-Critic Networks , 2019, IEEE Access.

[256]  Yee Whye Teh,et al.  Meta reinforcement learning as task inference , 2019, ArXiv.

[257]  Atil Iscen,et al.  NoRML: No-Reward Meta Learning , 2019, AAMAS.

[258]  Yee Whye Teh,et al.  Meta-learning of Sequential Strategies , 2019, ArXiv.

[259]  Richard Socher,et al.  Competitive Experience Replay , 2019, ICLR.

[260]  Nicolas Le Roux,et al.  A Geometric Perspective on Optimal Representations for Reinforcement Learning , 2019, NeurIPS.

[261]  Automated curricula through setter-solver interactions , 2019, ArXiv.

[262]  Ignacio Cases,et al.  Routing Networks and the Challenges of Modular and Compositional Computation , 2019, ArXiv.

[263]  Nan Jiang,et al.  Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.

[264]  Joelle Pineau,et al.  Learning Causal State Representations of Partially Observable Environments , 2019, ArXiv.

[265]  Tamim Asfour,et al.  ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[266]  Rui Wang,et al.  Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions , 2019, ArXiv.

[267]  Kenneth O. Stanley,et al.  Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.

[268]  Gerald Tesauro,et al.  Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.

[269]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[270]  Siyuan Li,et al.  Context-Aware Policy Reuse , 2018, AAMAS.

[271]  Joelle Pineau,et al.  Combined Reinforcement Learning via Abstract Representations , 2018, AAAI.

[272]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[273]  Nicolas W. Schuck,et al.  Sequential replay of nonspatial task states in the human hippocampus , 2018, Science.

[274]  Sergey Levine,et al.  Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.

[275]  Falk Lieder,et al.  Doing more with less: meta-reasoning and meta-learning in humans and machines , 2019, Current Opinion in Behavioral Sciences.

[276]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[277]  Joel Z. Leibo,et al.  Options as responses: Grounding behavioural hierarchies in multi-agent RL , 2019, ArXiv.

[278]  Anthony I. Jang,et al.  Positive reward prediction errors during decision making strengthen memory encoding , 2019, Nature Human Behaviour.

[279]  Jordi Torres,et al.  Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills , 2020, ICML.

[280]  Yi Wu,et al.  Multi-Task Reinforcement Learning with Soft Modularization , 2020, NeurIPS.

[281]  Ruosong Wang,et al.  Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.

[282]  Ali Farhadi,et al.  Supermasks in Superposition , 2020, NeurIPS.

[283]  David Vázquez,et al.  Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning , 2020, NeurIPS.

[284]  Adam S. Lowet,et al.  Distributional Reinforcement Learning in the Brain , 2020, Trends in Neurosciences.

[285]  Rob Fergus,et al.  Fast Adaptation via Policy-Dynamics Value Functions , 2020, ArXiv.

[286]  Jared Kaplan,et al.  A Neural Scaling Law from the Dimension of the Data Manifold , 2020, ArXiv.

[287]  Andrei A. Rusu,et al.  Embracing Change: Continual Learning in Deep Neural Networks , 2020, Trends in Cognitive Sciences.

[288]  Tor Lattimore,et al.  Behaviour Suite for Reinforcement Learning , 2019, ICLR.

[289]  Luisa M. Zintgraf,et al.  VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , 2019, ICLR.

[290]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[291]  Junhyuk Oh,et al.  Discovering Reinforcement Learning Algorithms , 2020, NeurIPS.

[292]  Mark Chen,et al.  Scaling Laws for Autoregressive Generative Modeling , 2020, ArXiv.

[293]  Alex Smola,et al.  Meta-Q-Learning , 2019, ICLR.

[294]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[295]  Sridhar Mahadevan,et al.  Optimizing for the Future in Non-Stationary MDPs , 2020, ICML.

[296]  Joel Lehman,et al.  Learning to Continually Learn , 2020, ECAI.

[297]  Doina Precup,et al.  Invariant Causal Prediction for Block MDPs , 2020, ICML.

[298]  David Simchi-Levi,et al.  Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism , 2020, ICML.

[299]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[300]  Pierre-Yves Oudeyer,et al.  Automatic Curriculum Learning For Deep RL: A Short Survey , 2020, IJCAI.

[301]  Doina Precup,et al.  Options of Interest: Temporal Abstraction with Interest Functions , 2020, AAAI.

[302]  Timothy M. Hospedales,et al.  Online Meta-Critic Learning for Off-Policy Actor-Critic Methods , 2020, NeurIPS.

[303]  Guangwen Yang,et al.  Model-based Adversarial Meta-Reinforcement Learning , 2020, NeurIPS.

[304]  Tom Mitchell,et al.  Jelly Bean World: A Testbed for Never-Ending Learning , 2020, ICLR.

[305]  Joelle Pineau,et al.  Interference and Generalization in Temporal Difference Learning , 2020, ICML.

[306]  Felipe Petroski Such,et al.  Generalized Hidden Parameter MDPs Transferable Model-based RL in a Handful of Trials , 2020, AAAI.

[307]  Richard J. Duro,et al.  DREAM Architecture: a Developmental Approach to Open-Ended Learning in Robotics , 2020, ArXiv.

[308]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[309]  G. Tesauro,et al.  On the Role of Weight Sharing During Deep Option Learning , 2019, AAAI.

[310]  Junhyuk Oh,et al.  Meta-Gradient Reinforcement Learning with an Objective Discovered Online , 2020, NeurIPS.

[311]  Pascal Vincent,et al.  Efficient Learning in Non-Stationary Linear Markov Decision Processes , 2020, ArXiv.

[312]  Doina Precup,et al.  Value Preserving State-Action Abstractions , 2020, AISTATS.

[313]  Pieter Abbeel,et al.  Generalized Hindsight for Reinforcement Learning , 2020, NeurIPS.

[314]  Ida Momennejad Learning Structures: Predictive Representations, Replay, and Generalization , 2020, Current Opinion in Behavioral Sciences.

[315]  Doina Precup,et al.  What can I do here? A Theory of Affordances in Reinforcement Learning , 2020, ICML.

[316]  J. Schulman,et al.  Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[317]  Joel Lehman,et al.  Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions , 2020, ICML.

[318]  Scott M. Jordan,et al.  Towards Safe Policy Improvement for Non-Stationary MDPs , 2020, NeurIPS.

[319]  Junhyuk Oh,et al.  What Can Learned Intrinsic Rewards Capture? , 2019, ICML.

[320]  Min Lin,et al.  Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual Learning , 2020, ArXiv.

[321]  Chelsea Finn,et al.  Deep Reinforcement Learning amidst Lifelong Non-Stationarity , 2020, ArXiv.

[322]  Daniel Guo,et al.  Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning , 2020, ICML.

[323]  Jacob Andreas,et al.  Experience Grounds Language , 2020, EMNLP.

[324]  Tom Schaul,et al.  Policy Evaluation Networks , 2020, ArXiv.

[325]  Shimon Whiteson,et al.  Multitask Soft Option Learning , 2019, UAI.

[326]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[327]  Andrea Bonarini,et al.  Sharing Knowledge in Multi-Task Deep Reinforcement Learning , 2020, ICLR.

[328]  Junhyuk Oh,et al.  Self-Tuning Deep Reinforcement Learning , 2020, ArXiv.

[329]  Timothy M. Hospedales,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[330]  Yoshua Bengio,et al.  CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning , 2020, ICLR.

[331]  P. Abbeel,et al.  Reset-Free Lifelong Learning with Skill-Space Planning , 2020, ICLR.

[332]  Samuel D McDougle,et al.  The role of executive function in shaping reinforcement learning , 2021, Current Opinion in Behavioral Sciences.

[333]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[334]  Brendan McCane,et al.  Pseudo-Rehearsal: Achieving Deep Reinforcement Learning without Catastrophic Forgetting , 2018, Neurocomputing.

[335]  Gerald Tesauro,et al.  A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning , 2020, ICML.

[336]  E. Kaufmann,et al.  A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces , 2020, AISTATS.

[337]  Sara Hooker,et al.  The hardware lottery , 2020, Commun. ACM.