论文信息 - The RobotriX: An Extremely Photorealistic and Very-Large-Scale Indoor Dataset of Sequences with Robot Trajectories and Interactions

The RobotriX: An Extremely Photorealistic and Very-Large-Scale Indoor Dataset of Sequences with Robot Trajectories and Interactions

Enter the RobotriX, an extremely photorealistic indoor dataset designed to enable the application of deep learning techniques to a wide variety of robotic vision problems. The RobotriX consists of hyperrealistic indoor scenes which are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by Unreal Engine into a virtual reality headset which captures gaze so that a human operator can move the robot and use controllers for the robotic hands; scene information is dumped on a per-frame basis so that it can be reproduced offline using UnrealCV to generate raw data and ground truth labels. By taking this approach, we were able to generate a dataset of 38 semantic classes across 512 sequences totaling 8M stills recorded at +60 frames per second with full HD resolution. For each frame, RGB-D and 3D information is provided with full annotations in both spaces. Thanks to the high quality and quantity of both raw information and annotations, the RobotriX will serve as a new milestone for investigating 2D and 3D robotic vision tasks with large-scale data-driven techniques.

[1] Andrea Vedaldi,et al. ResearchDoom and CocoDoom: Learning Computer Vision with Games , 2016, ArXiv.

[2] Silvio Savarese,et al. Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[3] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4] Bernard Ghanem,et al. Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications , 2017, International Journal of Computer Vision.

[5] Tomas Pfister,et al. Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Andrew J. Davison,et al. Real-Time Camera Tracking: When is High Frame-Rate Best? , 2012, ECCV.

[7] Matthew Johnson-Roberson,et al. Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[8] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9] Chenfanfu Jiang,et al. A virtual reality platform for dynamic human-scene interaction , 2016, SIGGRAPH ASIA Virtual Reality meets Physical Reality.

[10] Xinyu Liu,et al. Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[11] Michael J. Black,et al. A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[12] Xiaolin Hu,et al. UnrealStereo: A Synthetic Dataset for Analyzing Stereo Vision , 2016, ArXiv.

[13] Matthias Nießner,et al. Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[14] Silvio Savarese,et al. Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[15] Alberto Garcia-Garcia,et al. UnrealROX: an extremely photorealistic virtual reality environment for robotics simulations and synthetic data generation , 2018, Virtual Reality.

[16] Yuandong Tian,et al. Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[17] Bernard Ghanem,et al. UE4Sim: A Photo-Realistic Simulator for Computer Vision Applications , 2017, ArXiv.

[18] Shengen Yan,et al. Deep Image: Scaling up Image Recognition , 2015, ArXiv.

[19] Roberto Cipolla,et al. SceneNet: Understanding Real World Indoor Scenes With Synthetic Data , 2015, ArXiv.

[20] Jianxiong Xiao,et al. SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Chenfanfu Jiang,et al. Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars , 2017, International Journal of Computer Vision.

[22] Ali Farhadi,et al. AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[23] Simon Brodeur,et al. HoME: a Household Multimodal Environment , 2017, ICLR.

[24] Stefan Leutenegger,et al. SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25] Thomas A. Funkhouser,et al. Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Thomas A. Funkhouser,et al. MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments , 2017, ArXiv.

[27] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[28] Trevor Darrell,et al. Clockwork Convnets for Video Semantic Segmentation , 2016, ECCV Workshops.

[29] Yi Zhang,et al. UnrealCV: Virtual Worlds for Computer Vision , 2017, ACM Multimedia.

[30] Barbara Caputo,et al. Looking beyond appearances: Synthetic training data for deep CNNs in re-identification , 2017, Comput. Vis. Image Underst..

[31] Antonio M. López,et al. The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Ersin Yumer,et al. Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).