SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans

We present a novel approach to reconstructing lightweight, CAD-based representations of scanned 3D environments from commodity RGB-D sensors. Our key idea is to jointly optimize for both CAD model alignments as well as layout estimations of the scanned scene, explicitly modeling inter-relationships between objects-to-objects and objects-to-layout. Since object arrangement and scene layout are intrinsically coupled, we show that treating the problem jointly significantly helps to produce globally-consistent representations of a scene. Object CAD models are aligned to the scene by establishing dense correspondences between geometry, and we introduce a hierarchical layout prediction approach to estimate layout planes from corners and edges of the this http URL this end, we propose a message-passing graph neural network to model the inter-relationships between objects and layout, guiding generation of a globally object alignment in a scene. By considering the global scene layout, we achieve significantly improved CAD alignments compared to state-of-the-art methods, improving from 41.83% to 58.41% alignment accuracy on SUNCG and from 50.05% to 61.24% on ScanNet, respectively. The resulting CAD-based representations makes our method well-suited for applications in content creation such as augmented- or virtual reality.

[1]  Ersin Yumer,et al.  PlaneNet: Piece-Wise Planar Reconstruction from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Jörg Stückler,et al.  Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors , 2016, GCPR.

[3]  Abhinav Gupta,et al.  3D-RelNet: Joint Object and Relational Network for 3D Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  C. Goodall Procrustes methods in the statistical analysis of shape , 1991 .

[5]  Bertram Drost,et al.  3D Object Detection and Localization Using Multimodal Point Pair Features , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[6]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[7]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[8]  Matthias Nießner,et al.  3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Marc Pollefeys,et al.  Indoor Scan2BIM: Building information models of house interiors , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[11]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[12]  Matthias Nießner,et al.  Scan2CAD: Learning CAD Model Alignment in RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Leonidas J. Guibas,et al.  Database‐Assisted Object Retrieval for Real‐Time 3D Reconstruction , 2015, Comput. Graph. Forum.

[14]  Matthias Nießner,et al.  Scan2Mesh: From Unstructured Range Scans to 3D Meshes , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Matthias Nießner,et al.  End-to-End CAD Model Retrieval and 9DoF Alignment in 3D Scans , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Matthias Nießner,et al.  RIO: 3D Object Instance Re-Localization in Changing Indoor Environments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Jitendra Malik,et al.  Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jiajun Wu,et al.  Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[20]  Song-Chun Zhu,et al.  Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image , 2018, ECCV.

[21]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[22]  Silvio Savarese,et al.  3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[24]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[25]  Silvio Savarese,et al.  4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Vladlen Koltun,et al.  Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Leonidas J. Guibas,et al.  ObjectNet3D: A Large Scale Database for 3D Object Recognition , 2016, ECCV.

[28]  Chen Liu,et al.  FloorNet: A Unified Framework for Floorplan Reconstruction from 3D Scans , 2018, ECCV.

[29]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Slobodan Ilic,et al.  DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Derek Hoiem,et al.  Complete 3D Scene Parsing from an RGBD Image , 2018, International Journal of Computer Vision.

[32]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Daniel Cohen-Or,et al.  ALIGNet: Partial-Shape Agnostic Alignment via Unsupervised Learning , 2018, ACM Trans. Graph..

[34]  Jan Kautz,et al.  PlaneRCNN: 3D Plane Detection and Reconstruction From a Single Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[36]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[37]  Leonidas J. Guibas,et al.  Guided Real‐Time Scanning of Indoor Objects , 2013, Comput. Graph. Forum.

[38]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.

[39]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[40]  Antonio Torralba,et al.  Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[41]  Jiacheng Chen,et al.  Floor-SP: Inverse CAD for Floorplans by Sequential Room-Wise Shortest Path , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Matthias Nießner,et al.  3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).