TorchRL: A data-driven decision-making library for PyTorch

Striking a balance between integration and modularity is crucial for a machine learning library to be versatile and user-friendly, especially in handling decision and control tasks that involve large development teams and complex, real-world data, and environments. To address this issue, we propose TorchRL, a generalistic control library for PyTorch that provides well-integrated, yet standalone components. With a versatile and robust primitive design, TorchRL facilitates streamlined algorithm development across the many branches of Reinforcement Learning (RL) and control. We introduce a new PyTorch primitive, TensorDict, as a flexible data carrier that empowers the integration of the library's components while preserving their modularity. Hence replay buffers, datasets, distributed data collectors, environments, transforms and objectives can be effortlessly used in isolation or combined. We provide a detailed description of the building blocks, supporting code examples and an extensive overview of the library across domains and tasks. Finally, we show comparative benchmarks to demonstrate its computational efficiency. TorchRL fosters long-term support and is publicly available on GitHub for greater reproducibility and collaboration within the research community. The code is opensourced on https://github.com/pytorch/rl.

[1]  S. Levine,et al.  Offline Reinforcement Learning for Visual Navigation , 2022, CoRL.

[2]  Vikash Kumar,et al.  MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations , 2022, ICLR.

[3]  Scott A. Munns,et al.  Controlling Commercial Cooling Systems Using Reinforcement Learning , 2022, ArXiv.

[4]  Aja Huang,et al.  Discovering faster matrix multiplication algorithms with reinforcement learning , 2022, Nature.

[5]  Yecheng Jason Ma,et al.  VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training , 2022, ICLR.

[6]  Amanda Prorok,et al.  VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning , 2022, DARS.

[7]  P. Abbeel,et al.  DayDreamer: World Models for Physical Robot Learning , 2022, CoRL.

[8]  Vikash Kumar,et al.  R3M: A Universal Visual Representation for Robot Manipulation , 2022, CoRL.

[9]  Xiaolong Wang,et al.  Temporal Difference Learning for Model Predictive Control , 2022, ICML.

[10]  Amy Zhang,et al.  Online Decision Transformer , 2022, ICML.

[11]  Jeff Wu,et al.  WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.

[12]  Ben M. Hambly,et al.  Recent advances in reinforcement learning in finance , 2021, SSRN Electronic Journal.

[13]  Sergey Levine,et al.  Offline Reinforcement Learning with Implicit Q-Learning , 2021, ICLR.

[14]  Philipp Reist,et al.  Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning , 2021, CoRL.

[15]  Jun Zhu,et al.  Tianshou: a Highly Modularized Deep Reinforcement Learning Library , 2021, J. Mach. Learn. Res..

[16]  Angel X. Chang,et al.  Habitat 2.0: Training Home Assistants to Rearrange their Habitat , 2021, NeurIPS.

[17]  Olivier Bachem,et al.  Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation , 2021, NeurIPS Datasets and Benchmarks.

[18]  Pieter Abbeel,et al.  Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.

[19]  Quoc V. Le,et al.  A graph placement methodology for fast chip design , 2021, Nature.

[20]  Ramin M. Hasani,et al.  Latent Imagination Facilitates Zero-Shot Transfer in Autonomous Racing , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[21]  Yu Wang,et al.  The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games , 2021, NeurIPS.

[22]  Che Wang,et al.  Randomized Ensembled Double Q-Learning: Learning Fast Without a Model , 2021, ICLR.

[23]  Shimon Whiteson,et al.  Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge? , 2020, ArXiv.

[24]  Yuval Tassa,et al.  dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.

[25]  Sergio Gomez Colmenarejo,et al.  Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.

[26]  Justin Fu,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[27]  Amir Mosavi,et al.  Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics , 2020, Mathematics.

[28]  Shimon Whiteson,et al.  Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2020, J. Mach. Learn. Res..

[29]  Szil'ard Aradi,et al.  Survey of Deep Reinforcement Learning for Motion Planning of Autonomous Vehicles , 2020, IEEE Transactions on Intelligent Transportation Systems.

[30]  Andrea Bonarini,et al.  MushroomRL: Simplifying Reinforcement Learning Research , 2020, J. Mach. Learn. Res..

[31]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[32]  Yasuhiro Fujita,et al.  ChainerRL: A Deep Reinforcement Learning Library , 2019, J. Mach. Learn. Res..

[33]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[34]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[35]  Manuela Veloso,et al.  Reinforcement Learning for Market Making in a Multi-agent Dealer Market , 2019, ArXiv.

[36]  Ari S. Morcos,et al.  DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames , 2019, ICLR.

[37]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[38]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[39]  Pieter Abbeel,et al.  rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch , 2019, ArXiv.

[40]  Sriram Srinivasan,et al.  OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.

[41]  O. Bousquet,et al.  Google Research Football: A Novel Reinforcement Learning Environment , 2019, AAAI.

[42]  Jiaxing Song,et al.  Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems , 2019, KDD.

[43]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[44]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[45]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[46]  Xiaohui Ye,et al.  Horizon: Facebook's Open Source Applied Reinforcement Learning Platform , 2018, ArXiv.

[47]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[48]  Marc G. Bellemare,et al.  Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.

[49]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[50]  Weinan Zhang,et al.  Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising , 2018, CIKM.

[51]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[52]  Dan Horgan,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[53]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[54]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[55]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[56]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[57]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[58]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[59]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[60]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[61]  Wei Wu,et al.  Practical Block-Wise Neural Network Architecture Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[64]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[65]  Yuandong Tian,et al.  ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[66]  Joel Z. Leibo,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning , 2017, ArXiv.

[67]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[68]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[69]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[70]  P. Abbeel,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[71]  Jun Wang,et al.  Real-Time Bidding by Reinforcement Learning in Display Advertising , 2017, WSDM.

[72]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[73]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[74]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[75]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[76]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[77]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[78]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[79]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[80]  Rousslan Fernand Julien Dossa,et al.  CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms , 2022, J. Mach. Learn. Res..

[81]  A. Gleave,et al.  Stable-Baselines3: Reliable Reinforcement Learning Implementations , 2021, J. Mach. Learn. Res..

[82]  M. Tan,et al.  Multi Agent Reinforcement Learning Independent vs Cooperative Agents , 2003 .