RB2: Robotic Manipulation Benchmarking with a Twist

Benchmarks offer a scientific way to compare algorithms using objective performance metrics. Good benchmarks have two features: (a) they should be widely useful for many research groups; (b) and they should produce reproducible findings. In robotic manipulation research, there is a trade-off between reproducibility and broad accessibility. If the benchmark is kept restrictive (fixed hardware, objects), the numbers are reproducible but the setup becomes less general. On the other hand, a benchmark could be a loose set of protocols (e.g. object set [9]) but the underlying variation in setups make the results non-reproducible. In this paper, we re-imagine benchmarking for robotic manipulation as state-of-the-art algorithmic implementations, alongside the usual set of tasks and experimental protocols. The added baseline implementations will provide a way to easily recreate SOTA numbers in a new local robotic setup, thus providing credible relative rankings between existing approaches and new work. However, these “local rankings” could vary between different setups. To resolve this issue, we build a mechanism for pooling experimental data between labs, and thus we establish a single global ranking for existing (and proposed) SOTA algorithms. Our benchmark, called Ranking-Based Robotics Benchmark (RB2), is evaluated on tasks that are inspired from clinically validated Southampton Hand Assessment Procedures [27]. Our benchmark was run across two different labs and reveals several surprising findings. For example, extremely simple baselines like open-loop behavior cloning, outperform more complicated models (e.g. closed loop, RNN, Offline-RL, etc.) that are preferred by the field. We hope our fellow researchers will use RB2 to improve their research’s quality and rigor.

[1]  R. Plackett The Analysis of Permutations , 1975 .

[2]  P H Chappell,et al.  The Southampton Hand: an intelligent myoelectric prosthesis. , 1994, Journal of rehabilitation research and development.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Hiroaki Kitano,et al.  RoboCup: The Robot World Cup Initiative , 1997, AGENTS '97.

[5]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[6]  S. Schaal Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[7]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[8]  N. Takahashi Aging , 1992, Cell.

[9]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[10]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[11]  Jeremy A. Marvel,et al.  Technology readiness levels for randomized bin picking , 2012, PerMIS.

[12]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Satoshi Endo,et al.  Dynamic Movement Primitives for Human-Robot interaction: Comparison with human behavioral observation , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  B. Fernhall,et al.  Aging, hypertension and physiological tremor: The contribution of the cardioballistic impulse to tremorgenesis in older adults , 2013, Journal of the Neurological Sciences.

[15]  P. Abbeel,et al.  Benchmarking in Manipulation Research: The YCB Object and Model Set and Benchmarking Protocols , 2015, ArXiv.

[16]  Sergey Levine,et al.  Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[17]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[18]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[20]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[21]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[22]  Rouhollah Rahmatizadeh,et al.  From Virtual Demonstration to Real-World Manipulation Using LSTM and MDN , 2016, AAAI.

[23]  Joseph Falco,et al.  Performance Metrics and Test Methods for Robotic Hands , 2018 .

[24]  Oliver Brock,et al.  Analysis and Observations From the First Amazon Picking Challenge , 2016, IEEE Transactions on Automation Science and Engineering.

[25]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[26]  Sergey Levine,et al.  REPLAB: A Reproducible Low-Cost Arm Benchmark Platform for Robotic Learning , 2019, ArXiv.

[27]  S. Levine,et al.  ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots , 2019, CoRL.

[28]  Abhinav Gupta,et al.  PyRobot: An Open-source Robotics Framework for Research and Benchmarking , 2019, ArXiv.

[29]  Scott Niekum,et al.  Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations , 2019, CoRL.

[30]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[31]  Sergey Levine,et al.  The Ingredients of Real-World Robotic Reinforcement Learning , 2020, ICLR.

[32]  T. Joachims,et al.  MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.

[33]  Danica Kragic,et al.  Benchmarking Bimanual Cloth Manipulation , 2020, IEEE Robotics and Automation Letters.

[34]  Ugo Pattacini,et al.  GRASPA 1.0: GRASPA is a Robot Arm graSping Performance BenchmArk , 2020, IEEE Robotics and Automation Letters.

[35]  Aude Billard,et al.  Benchmark for Bimanual Robotic Manipulation of Semi-Deformable Objects , 2020, IEEE Robotics and Automation Letters.

[36]  Abhinav Gupta,et al.  Neural Dynamic Policies for End-to-End Sensorimotor Learning , 2020, NeurIPS.

[37]  Roberto Mart'in-Mart'in,et al.  robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.

[38]  Kaiyu Hang,et al.  Benchmarking Cluttered Robot Pick-and-Place Manipulation With the Box and Blocks Test , 2020, IEEE Robotics and Automation Letters.

[39]  Kaiyu Hang,et al.  Benchmarking Protocol for Grasp Planning Algorithms , 2020, IEEE Robotics and Automation Letters.

[40]  Andrew J. Davison,et al.  RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.

[41]  Justin Fu,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[42]  Oliver Brock,et al.  Benchmarking Hand and Grasp Resilience to Dynamic Loads , 2020, IEEE Robotics and Automation Letters.

[43]  Guillermo Heredia,et al.  Benchmarks for Aerial Manipulation , 2020, IEEE Robotics and Automation Letters.

[44]  Leonidas J. Guibas,et al.  SAPIEN: A SimulAted Part-Based Interactive ENvironment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Roozbeh Mottaghi,et al.  ManipulaTHOR: A Framework for Visual Object Manipulation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Deepak Pathak,et al.  Hierarchical Neural Dynamic Policies , 2021, Robotics: Science and Systems.

[47]  Joseph J. Lim,et al.  IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks , 2019, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Siddhartha S. Srinivasa,et al.  Benchmarking Structured Policies and Policy Optimization for Real-World Dexterous Object Manipulation , 2021, IEEE Robotics and Automation Letters.