Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

Recently, there has been growing interest in developing learning-based methods to detect and utilize salient semi-global or global structures, such as junctions, lines, planes, cuboids, smooth surfaces, and all types of symmetries, for 3D scene modeling and understanding. However, the ground truth annotations are often obtained via human labor, which is particularly challenging and inefficient for such tasks due to the large number of 3D structure instances (e.g., line segments) and other factors such as viewpoints and occlusions. In this paper, we present a new synthetic dataset, Structured3D, with the aim of providing large-scale photo-realistic images with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks. We take advantage of the availability of professional interior designs and automatically extract 3D structures from them. We generate high-quality images with an industry-leading rendering engine. We use our synthetic dataset in combination with real images to train deep networks for room layout estimation and demonstrate improved performance on benchmark datasets.

[1]  A. Witkin,et al.  On the Role of Structure in Vision , 1983 .

[2]  Pat Hanrahan,et al.  Ray tracing on programmable graphics hardware , 2002, SIGGRAPH Courses.

[3]  Krista A. Ehinger,et al.  Recognizing scene viewpoint using panoramic place representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jianxiong Xiao,et al.  Localizing 3D cuboids in single-view images , 2012, NIPS.

[5]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[6]  Yinda Zhang,et al.  PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding , 2014, ECCV.

[7]  Ingo Wald,et al.  Embree: a kernel framework for efficient CPU ray tracing , 2014, ACM Trans. Graph..

[8]  Gregory S. Johnson,et al.  Embree , 2014 .

[9]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[11]  Germán Ros,et al.  Training Constrained Deconvolutional Networks for Road Scene Semantic Segmentation , 2016, ArXiv.

[12]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[13]  Tomasz Malisiewicz,et al.  Deep Cuboid Detection: Beyond 2D Bounding Boxes , 2016, ArXiv.

[14]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ersin Yumer,et al.  Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Tomasz Malisiewicz,et al.  RoomNet: End-to-End Room Layout Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[18]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[19]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Stefan Leutenegger,et al.  SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Leonidas J. Guibas,et al.  Learning Shape Abstractions by Assembling Volumetric Primitives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jiajun Wu,et al.  Raster-to-Vector: Revisiting Floorplan Transformation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ming-Hsuan Yang,et al.  Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Ersin Yumer,et al.  PlaneNet: Piece-Wise Planar Reconstruction from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Zihan Zhou,et al.  Recovering 3D Planes from a Single Image via Convolutional Neural Networks , 2018, ECCV.

[27]  Song-Chun Zhu,et al.  Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image , 2018, ECCV.

[28]  Mathieu Aubry,et al.  A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Chen Liu,et al.  FloorNet: A Unified Framework for Floorplan Reconstruction from 3D Scans , 2018, ECCV.

[30]  Yuandong Tian,et al.  3D Interpreter Networks for Viewer-Centered Wireframe Modeling , 2018, International Journal of Computer Vision.

[31]  Derek Hoiem,et al.  LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Kun Huang,et al.  Learning to Parse Wireframes in Images of Man-Made Environments , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Wenbin Li,et al.  InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[34]  Luc Van Gool,et al.  Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jiacheng Chen,et al.  Floor-SP: Inverse CAD for Floorplans by Sequential Room-Wise Shortest Path , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Angel X. Chang,et al.  PlanIT: planning and instantiating indoor scenes with relation graph and spatial prior networks , 2019, ACM Trans. Graph..

[37]  Derek Hoiem,et al.  3D Manhattan Room Layout Reconstruction from a Single 360 Image , 2019, ArXiv.

[38]  Cheng Sun,et al.  HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Silvio Savarese,et al.  3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Zihan Zhou,et al.  Single-Image Piece-Wise Planar 3D Reconstruction via Associative Embedding , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Peter Wonka,et al.  DuLa-Net: A Dual-Projection Network for Estimating Room Layouts From a Single RGB Panorama , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jan Kautz,et al.  PlaneRCNN: 3D Plane Detection and Reconstruction From a Single Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Li-Yi Wei,et al.  Learning to Reconstruct 3D Manhattan Wireframes From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).