DeepMind Control Suite

The DeepMind Control Suite is a set of continuous control tasks with a standardised structure and interpretable rewards, intended to serve as performance benchmarks for reinforcement learning agents. The tasks are written in Python and powered by the MuJoCo physics engine, making them easy to use and modify. We include benchmarks for several learning algorithms. The Control Suite is publicly available at this https URL A video summary of all tasks is available at this http URL .

[1]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Jing Peng,et al.  Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .

[3]  Karl Sims,et al.  Evolving virtual creatures , 1994, SIGGRAPH.

[4]  Mark W. Spong,et al.  The swing up control problem for the Acrobot , 1995 .

[5]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[6]  Rémi Coulom,et al.  Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[7]  Pawel Wawrzynski,et al.  Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.

[8]  Yuval Tassa,et al.  Stochastic Complementarity for Local Control of Discontinuous Dynamics , 2010, Robotics: Science and Systems.

[9]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Yuval Tassa,et al.  Simulation tools for model-based robotics: Comparison of Bullet, Havok, MuJoCo, ODE and PhysX , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[15]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[16]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[17]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[18]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[19]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[20]  Peter Henderson,et al.  Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.

[21]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[22]  Yuval Tassa,et al.  Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[23]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.