Demystifying Reproducibility in Meta- and Multi-Task Reinforcement Learning

Establishing the significance of experimental results in reinforcement learning (RL) is difficult. This is compounded by the additional complexity of metaand multi-task RL (MTRL), a rapidlygrowing research area which lacks well-defined baselines. We analyze several design decisions each author must make when they implement a meta-RL or MTRL algorithm, and use over 500 experiments to show that these seemingly-small details can create statistically-significant variations in a single algorithm’s performance that exceed the reported performance differences between algorithms themselves. Informed by this analysis, we precisely define several important hyperparameters, design decisions, and evaluation metrics for meta-RL and MTRL methods, so that we can compare these methods reproducibly. We then provide multi-seed benchmark results for seven of the most popular meta-RL and MTRL algorithms on the most challenging benchmarks currently available. Finally, we share with the community an open source package of these algorithm reference implementations, which use our consistent definitions, achieve state-of-theart-performance, and seek to follow the original works introducing these algorithms as closely as possible.

[1]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[2]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[3]  Amos J. Storkey,et al.  How to train your MAML , 2018, ICLR.

[4]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[5]  Eiko Yoneki,et al.  RLgraph: Modular Computation Graphs for Deep Reinforcement Learning , 2019, MLSys.

[6]  Li Zhang,et al.  Learning to Learn: Meta-Critic Networks for Sample Efficient Learning , 2017, ArXiv.

[7]  Marc G. Bellemare,et al.  Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.

[8]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[9]  Silvio Savarese,et al.  SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark , 2018, CoRL.

[10]  Prabhat Nagarajan,et al.  ChainerRL: A Deep Reinforcement Learning Library , 2019, ArXiv.

[11]  Pierre-Yves Oudeyer,et al.  How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments , 2018, ArXiv.

[12]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[13]  John Schulman,et al.  Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[14]  Fabien Moutarde,et al.  Is Deep Reinforcement Learning Really Superhuman on Atari? , 2019, NeurIPS 2019.

[15]  Samy Bengio,et al.  Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2020, ICLR.

[16]  Oleksii Hrinchuk,et al.  Catalyst.RL: A Distributed Framework for Reproducible RL Research , 2019, ArXiv.

[17]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[18]  Yoshua Bengio,et al.  Torchmeta: A Meta-Learning library for PyTorch , 2019, ArXiv.

[19]  Pierre-Yves Oudeyer,et al.  A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms , 2019, RML@ICLR.

[20]  Joelle Pineau,et al.  Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods , 2018, ArXiv.

[21]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[22]  Trevor Darrell,et al.  Regularization Matters in Policy Optimization , 2019, ArXiv.

[23]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[24]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[25]  Julian Togelius,et al.  Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning , 2019, IJCAI.

[26]  Pieter Abbeel,et al.  Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.

[27]  John F. Canny,et al.  Measuring the Reliability of Reinforcement Learning Algorithms , 2019, ICLR.

[28]  Karol Hausman,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[29]  Sergey Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[30]  Katja Hofmann,et al.  Meta Reinforcement Learning with Latent Variable Gaussian Processes , 2018, UAI.

[31]  Tristan Deleu,et al.  On the reproducibility of gradient-based Meta-Reinforcement Learning baselines , 2018 .

[32]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[33]  Peter Henderson,et al.  Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.

[34]  John Schulman,et al.  Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[35]  Tor Lattimore,et al.  Behaviour Suite for Reinforcement Learning , 2019, ICLR.

[36]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[37]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[38]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[39]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[40]  Sergey Levine,et al.  Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[41]  Andrew J. Davison,et al.  RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.

[42]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[43]  Sergey Levine,et al.  REPLAB: A Reproducible Low-Cost Arm Benchmark Platform for Robotic Learning , 2019, ArXiv.

[44]  Pieter Abbeel,et al.  Meta-Learning with Temporal Convolutions , 2017, ArXiv.

[45]  Simon Brodeur,et al.  HoME: a Household Multimodal Environment , 2017, ICLR.

[46]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[47]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[48]  Joelle Pineau,et al.  Natural Environment Benchmarks for Reinforcement Learning , 2018, ArXiv.

[49]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[50]  Jing Peng,et al.  Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .

[51]  James Bergstra,et al.  Benchmarking Reinforcement Learning Algorithms on Real-World Robots , 2018, CoRL.

[52]  David Silver,et al.  Learning values across many orders of magnitude , 2016, NIPS.

[53]  Peter Stone,et al.  The Impact of Nondeterminism on Reproducibility in Deep Reinforcement Learning , 2018 .

[54]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[55]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[56]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[57]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[58]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[59]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[60]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[61]  Laura Graesser,et al.  SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement Learning , 2019, ArXiv.

[62]  Joelle Pineau,et al.  RE-EVALUATE: Reproducibility in Evaluating Reinforcement Learning Algorithms , 2018 .

[63]  Sebastian Thrun,et al.  Explanation-based neural network learning a lifelong learning approach , 1995 .

[64]  David Silver,et al.  Meta-Gradient Reinforcement Learning , 2018, NeurIPS.

[65]  Larry Rudolph,et al.  Implementation Matters in Deep RL: A Case Study on PPO and TRPO , 2020, ICLR.