Stillleben: Realistic Scene Synthesis for Deep Learning in Robotics

Training data is the key ingredient for deep learning approaches, but difficult to obtain for the specialized domains often encountered in robotics. We describe a synthesis pipeline capable of producing training data for cluttered scene perception tasks such as semantic segmentation, object detection, and correspondence or pose estimation. Our approach arranges object meshes in physically realistic, dense scenes using physics simulation. The arranged scenes are rendered using high-quality rasterization with randomized appearance and material parameters. Noise and other transformations introduced by the camera sensors are simulated. Our pipeline can be run online during training of a deep neural network, yielding applications in life-long learning and in iterative render-and-compare approaches. We demonstrate the usability by learning semantic segmentation on the challenging YCB-Video dataset without actually using any training frames, where our method achieves performance comparable to a conventionally trained model. Additionally, we show successful application in a real-world regrasping system.

[1]  OpenAI Learning Dexterous In-Hand Manipulation. , 2018 .

[2]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[4]  Dieter Fox,et al.  Self-Supervised Visual Descriptor Learning for Dense Correspondence , 2017, IEEE Robotics and Automation Letters.

[5]  Stanley T. Birchfield,et al.  Falling Things: A Synthetic Dataset for 3D Object Detection and Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Matthew Johnson-Roberson,et al.  Modeling Camera Effects to Improve Visual Learning from Synthetic Data , 2018, ECCV Workshops.

[7]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Robert L. Cook,et al.  A Reflectance Model for Computer Graphics , 1987, TOGS.

[9]  Sven Behnke,et al.  Refining 6D Object Pose Predictions using Abstract Render-and-Compare , 2019, 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids).

[10]  Sanja Fidler,et al.  Meta-Sim: Learning to Generate Synthetic Datasets , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Sven Behnke,et al.  Autonomous Bimanual Functional Regrasping of Novel Object Class Instances , 2019, 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids).

[12]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[13]  Sven Behnke,et al.  Fast Object Learning and Dual-arm Coordination for Cluttered Stowing, Picking, and Packing , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Michael Garland,et al.  Surface simplification using quadric error metrics , 1997, SIGGRAPH.

[15]  Leonidas J. Guibas,et al.  ObjectNet3D: A Large Scale Database for 3D Object Recognition , 2016, ECCV.

[16]  Peter I. Corke,et al.  Cartman: The Low-Cost Cartesian Manipulator that Won the Amazon Robotics Challenge , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Vincent Lepetit,et al.  Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation , 2018, ECCV.

[18]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[19]  Randima Fernando,et al.  GPU Gems: Programming Techniques, Tips and Tricks for Real-Time Graphics , 2004 .

[20]  Ersin Yumer,et al.  Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ian D. Reid,et al.  Light-Weight RefineNet for Real-Time Semantic Segmentation , 2018, BMVC.

[22]  Yi Li,et al.  DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[23]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.