Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking

Deep reinforcement learning has been one of the fastest growing fields of machine learning over the past years and numerous libraries have been open sourced to support research. However, most codebases have a steep learning curve or limited flexibility that do not satisfy a need for fast prototyping in fundamental research. This paper introduces Tonic, a Python library allowing researchers to quickly implement new ideas and measure their importance by providing: 1) a collection of configurable modules such as exploration strategies, replays, neural networks, and updaters 2) a collection of baseline agents: A2C, TRPO, PPO, MPO, DDPG, D4PG, TD3 and SAC built with these modules 3) support for the two most popular deep learning frameworks: TensorFlow 2 and PyTorch 4) support for the three most popular sets of continuous-control environments: OpenAI Gym, DeepMind Control Suite and PyBullet 5) a large-scale benchmark of the baseline agents on 70 continuous-control tasks 6) scripts to experiment in a reproducible way, plot results, and play with trained agents.

[1]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[2]  Matthew W. Hoffman,et al.  Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[3]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[4]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[5]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[6]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[7]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[8]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[9]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[10]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[11]  Navdeep Jaitly,et al.  Discrete Sequential Prediction of Continuous Actions for Deep RL , 2017, ArXiv.

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Andriy Mnih,et al.  Q-Learning in enormous action spaces via amortized approximate maximization , 2020, ArXiv.

[14]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[15]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[18]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[19]  Arash Tavakoli,et al.  Action Branching Architectures for Deep Reinforcement Learning , 2017, AAAI.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[22]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[23]  Marc G. Bellemare,et al.  Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.

[24]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[25]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[26]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[27]  Andrea Bonarini,et al.  MushroomRL: Simplifying Reinforcement Learning Research , 2020, J. Mach. Learn. Res..

[28]  Vitaly Levdik,et al.  Time Limits in Reinforcement Learning , 2017, ICML.

[29]  Ion Stoica,et al.  Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.

[30]  Rousslan Fernand Julien Dossa,et al.  CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms , 2021, J. Mach. Learn. Res..

[31]  Sergio Gomez Colmenarejo,et al.  Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.

[32]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[33]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[34]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[35]  Pieter Abbeel,et al.  rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch , 2019, ArXiv.

[36]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[38]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..