论文信息 - ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots

ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots

ROBEL is an open-source platform of cost-effective robots designed for reinforcement learning in the real world. ROBEL introduces two robots, each aimed to accelerate reinforcement learning research in different task domains: D'Claw is a three-fingered hand robot that facilitates learning dexterous manipulation tasks, and D'Kitty is a four-legged robot that facilitates learning agile legged locomotion tasks. These low-cost, modular robots are easy to maintain and are robust enough to sustain on-hardware reinforcement learning from scratch with over 14000 training hours registered on them to date. To leverage this platform, we propose an extensible set of continuous control benchmark tasks for each robot. These tasks feature dense and sparse task objectives, and additionally introduce score metrics as hardware-safety. We provide benchmark scores on an initial set of tasks using a variety of learning-based methods. Furthermore, we show that these results can be replicated across copies of the robots located in different institutions. Code, documentation, design files, detailed assembly instructions, final policies, baseline details, task videos, and all supplementary materials required to reproduce the results are available at www.roboticsbenchmarks.org.

[1] Abhinav Gupta,et al. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[2] Chang Li,et al. Design and Fabrication of a Soft Robotic Hand With Embedded Actuators and Sensors , 2015 .

[3] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.

[4] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[5] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[6] Sebastian Thrun,et al. A Personal Account of the Development of Stanley, the Robot That Won the DARPA Grand Challenge , 2006, AI Mag..

[7] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[8] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10] Li Wang,et al. The Robotarium: A remotely accessible swarm robotics research testbed , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[11] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[13] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14] Christopher Joseph Pal,et al. Active Domain Randomization , 2019, CoRL.

[15] Robert D. Howe,et al. The Highly Adaptive SDM Hand: Design and Performance Evaluation , 2010, Int. J. Robotics Res..

[16] OpenAI. Learning Dexterous In-Hand Manipulation. , 2018 .

[17] John Kenneth Salisbury,et al. Towards a personal robotics development platform: Rationale and design of an intrinsically safe personal robot , 2008, 2008 IEEE International Conference on Robotics and Automation.

[18] Emanuel Todorov,et al. Design of a highly biomimetic anthropomorphic robotic hand towards artificial limb regeneration , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[19] G. Martin,et al. Nonlinear model predictive control , 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251).

[20] Twan Koolen,et al. Team IHMC's Lessons Learned from the DARPA Robotics Challenge Trials , 2015, J. Field Robotics.

[21] Glen Berseth,et al. DeepLoco , 2017, ACM Trans. Graph..

[22] Dieter Fox,et al. BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators , 2019, Robotics: Science and Systems.

[23] James Bergstra,et al. Benchmarking Reinforcement Learning Algorithms on Real-World Robots , 2018, CoRL.

[24] Frank Allgöwer,et al. Nonlinear Model Predictive Control , 2007 .

[25] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27] Martijn Wisse,et al. A Three-Dimensional Passive-Dynamic Walking Robot with Two Legs and Knees , 2001, Int. J. Robotics Res..

[28] Sergey Levine,et al. REPLAB: A Reproducible Low-Cost Arm Benchmark Platform for Robotic Learning , 2019, ArXiv.

[29] Sangbae Kim,et al. The MIT super mini cheetah: A small, low-cost quadrupedal robot for dynamic locomotion , 2015, 2015 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[30] Andrew J. Davison,et al. Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[31] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[32] Sebastian Thrun,et al. Stanley: The robot that won the DARPA Grand Challenge , 2006, J. Field Robotics.

[33] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[34] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[35] Henry Zhu,et al. Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[36] Siddhartha S. Srinivasa,et al. The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[37] Sergey Levine,et al. Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[38] Blake Hannaford,et al. The RAVEN: Design and Validation of a Telesurgery System , 2009, Int. J. Robotics Res..

[39] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[40] Sven Behnke,et al. Robot Competitions Ideal Benchmarks for Robotics Research , 2006 .