DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

Panorama images have a much larger field-of-view thus naturally encode enriched scene context information compared to standard perspective images, which however is not well exploited in the previous scene understanding methods. In this paper, we propose a novel method for panoramic 3D scene understanding which recovers the 3D room layout and the shape, pose, position, and semantic category for each object from a single full-view panorama image. In order to fully utilize the rich context information, we design a novel graph neural network based context model to predict the relationship among objects and room layout, and a differentiable relationship-based optimization module to optimize object arrangement with well-designed objective functions on-the-fly. Realizing the existing data are either with incomplete ground truth or overly-simplified scene, we present a new synthetic dataset with good diversity in room layout and furniture placement, and realistic image quality for total panoramic 3D scene understanding. Experiments demonstrate that our method outperforms existing methods on panoramic scene understanding in terms of both geometry accuracy and object arrangement. Code is available at https://chengzhag.github.io/publication/dpc.

[1]  Yinda Zhang,et al.  DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Song-Chun Zhu,et al.  Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation , 2018, NeurIPS.

[3]  Zihan Zhou,et al.  Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling , 2019, ECCV.

[4]  Cheng Sun,et al.  HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[6]  Leonidas J. Guibas,et al.  GRASS: Generative Recursive Autoencoders for Shape Structures , 2017, ACM Trans. Graph..

[7]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jianlong Fu,et al.  360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[10]  Ersin Yumer,et al.  SeeThrough: Finding Chairs in Heavily Occluded Indoor Scene Images , 2017 .

[11]  Yichen Wei,et al.  Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Johnny Huynh,et al.  Separating Axis Theorem for Oriented Bounding Boxes , 2009 .

[13]  Yinda Zhang,et al.  PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding , 2014, ECCV.

[14]  Shi Jin,et al.  Automatic 3D Indoor Scene Modeling from Single Panorama , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  T. Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Hui Zhang,et al.  Efficient 3D Room Shape Recovery from a Single Panorama , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[18]  Mathieu Aubry,et al.  AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation , 2018, CVPR 2018.

[19]  Alan L. Yuille,et al.  Manhattan World: compass direction from a single image by Bayesian inference , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[20]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xiaoguang Han,et al.  Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Cheng Zhang,et al.  Holistic 3D Scene Understanding from a Single Image with Implicit Representation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[24]  Enrico Gobbetti,et al.  Omnidirectional image capture on mobile devices for fast automatic generation of 2.5D indoor maps , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[25]  Lixin Fan,et al.  Object Detection in Equirectangular Panorama , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[26]  Chen Kong,et al.  Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction , 2017, AAAI.

[27]  Silvio Savarese,et al.  iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28]  Derek Hoiem,et al.  LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Tomasz Malisiewicz,et al.  RoomNet: End-to-End Room Layout Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Thomas Funkhouser,et al.  Local Deep Implicit Functions for 3D Shape , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Song-Chun Zhu,et al.  Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Stefan Lee,et al.  Graph R-CNN for Scene Graph Generation , 2018, ECCV.

[34]  Yoshihiko Mochizuki,et al.  Room reconstruction from a single spherical image by higher-order energy minimization , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[35]  James L. Crowley,et al.  Symmetry Aware Evaluation of 3D Object Detection and Pose Estimation in Scenes of Many Parts in Bulk , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[36]  Zhijian Liu,et al.  Learning to Exploit Stability for 3D Scene Parsing , 2018, NeurIPS.

[37]  Jaishanker K. Pillai,et al.  Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Svetlana Lazebnik,et al.  Learning Informative Edge Maps for Indoor Scene Layout Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Kobus Barnard,et al.  Understanding Bayesian Rooms Using Composite 3D Object Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Matthias Nießner,et al.  RIO: 3D Object Instance Re-Localization in Changing Indoor Environments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  Daniel Fried,et al.  Bayesian geometric modeling of indoor scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Krista A. Ehinger,et al.  Recognizing scene viewpoint using panoramic place representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Björn Stenger,et al.  Pano2CAD: Room Layout from a Single Panorama Image , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45]  Peter Wonka,et al.  DuLa-Net: A Dual-Projection Network for Estimating Room Layouts From a Single RGB Panorama , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Xiaojian Liu,et al.  The Research of Collision Detection Algorithm Based on Separating axis Theorem , 2015 .

[47]  Angela Dai,et al.  SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans , 2020, ECCV.

[48]  Song-Chun Zhu,et al.  Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image , 2018, ECCV.

[49]  Silvio Savarese,et al.  DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).