Single-Shot Cuboids: Geodesics-based End-to-end Manhattan Aligned Layout Estimation from Spherical Panoramas

Abstract It has been shown that global scene understanding tasks like layout estimation can benefit from wider field of views, and specifically spherical panoramas. While much progress has been made recently, all previous approaches rely on intermediate representations and postprocessing to produce Manhattan-aligned estimates. In this work we show how to estimate full room layouts in a single-shot, eliminating the need for postprocessing. Our work is the first to directly infer Manhattan-aligned outputs. To achieve this, our data-driven model exploits direct coordinate regression and is supervised end-to-end. As a result, we can explicitly add quasi-Manhattan constraints, which set the necessary conditions for a homography-based Manhattan alignment module. Finally, we introduce the geodesic heatmaps and loss and a boundary-aware center of mass calculation that facilitate higher quality keypoint estimation in the spherical domain. Our models and code are publicly available at https://github.com/VCL3D/SingleShotCuboids .

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Peter Wonka,et al.  DuLa-Net: A Dual-Projection Network for Estimating Room Layouts From a Single RGB Panorama , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Enrico Gobbetti,et al.  State‐of‐the‐art in Automatic 3D Reconstruction of Structured Indoor Environments , 2020, Comput. Graph. Forum.

[4]  Seungyong Lee,et al.  Robust upright adjustment of 360 spherical panoramas , 2017, The Visual Computer.

[5]  Enrico Gobbetti,et al.  Omnidirectional image capture on mobile devices for fast automatic generation of 2.5D indoor maps , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[7]  Andreas Geiger,et al.  SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images , 2018, ECCV.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Alexandr A. Kalinin,et al.  Albumentations: fast and flexible image augmentations , 2018, Inf..

[10]  Petros Daras,et al.  OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas , 2018, ECCV.

[11]  Kristen Grauman,et al.  Kernel Transformer Networks for Compact Spherical Convolution , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[13]  Guonian Lv,et al.  An Interactive Indoor 3D Reconstruction Method Based on Conformal Geometry Algebra , 2018, Advances in Applied Clifford Algebras.

[14]  Mao Ye,et al.  Distribution-Aware Coordinate Representation for Human Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Josechu J. Guerrero,et al.  PanoRoom: From the Sphere to the 3D Layout , 2018, ArXiv.

[16]  Lu Yu,et al.  Weighted-to-Spherically-Uniform Quality Evaluation for Omnidirectional Video , 2017, IEEE Signal Processing Letters.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[19]  Yichen Wei,et al.  Integral Human Pose Regression , 2017, ECCV.

[20]  Nathanael Perraudin,et al.  DeepSphere: towards an equivariant graph-based spherical CNN , 2019, ArXiv.

[21]  Zihan Zhou,et al.  Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling , 2019, ECCV.

[22]  Cheng Sun,et al.  HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yoshihiko Mochizuki,et al.  Room reconstruction from a single spherical image by higher-order energy minimization , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[24]  Torsten Sattler,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Derek Hoiem,et al.  LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  David Picard,et al.  Human Pose Regression by Combining Indirect Part Detection and Contextual Information , 2017, Comput. Graph..

[27]  Peter Wonka,et al.  Manhattan Room Layout Reconstruction from a Single $360^{\circ }$ Image: A Comparative Study of State-of-the-Art Methods , 2021, International Journal of Computer Vision.

[28]  Rafael Grompone von Gioi,et al.  LSD: a Line Segment Detector , 2012, Image Process. Line.

[29]  Zhen He,et al.  Numerical Coordinate Regression with Convolutional Neural Networks , 2018, ArXiv.

[30]  Kostas Daniilidis,et al.  Learning SO(3) Equivariant Representations with Spherical CNNs , 2017, International Journal of Computer Vision.

[31]  Alberto Jaspe Villanueva,et al.  Automatic modeling of cluttered multi‐room floor plans from panoramic images , 2019, Comput. Graph. Forum.

[32]  Shi Jin,et al.  Automatic 3D Indoor Scene Modeling from Single Panorama , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[34]  Silvio Savarese,et al.  Im2Pano3D: Extrapolating 360° Structure and Semantics Beyond the Field of View , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[36]  Yinda Zhang,et al.  PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding , 2014, ECCV.

[37]  Hui Zhang,et al.  Efficient 3D Room Shape Recovery from a Single Panorama , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jean-Charles Bazin,et al.  Deep360Up: A Deep Learning-Based Approach for Automatic VR Image Upright Adjustment , 2019, 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR).

[39]  Tomasz Malisiewicz,et al.  RoomNet: End-to-End Room Layout Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Josef Kittler,et al.  Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Petros Daras,et al.  Spherical View Synthesis for Self-Supervised 360° Depth Estimation , 2019, 2019 International Conference on 3D Vision (3DV).

[42]  Clara Fernandez-Labrador,et al.  Layouts From Panoramic Images With Geometry and Deep Learning , 2018, IEEE Robotics and Automation Letters.

[43]  Jan-Michael Frahm,et al.  Mapped Convolutions , 2019, ArXiv.

[44]  Rafael Monroy,et al.  SalNet360: Saliency Maps for omni-directional images with CNN , 2017, Signal Process. Image Commun..

[45]  Yi Zhou,et al.  3D Room Reconstruction from A Single Fisheye Image , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[46]  Tony Martinez,et al.  Robust Keypoint Detection , 2019, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW).

[47]  Enrico Gobbetti,et al.  AtlantaNet: Inferring the 3D Indoor Layout from a Single $360^\circ $ Image Beyond the Manhattan World Assumption , 2020, ECCV.

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Min Sun,et al.  HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Thomas Funkhouser,et al.  Neural Illumination: Lighting Prediction for Indoor Environments , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Min Sun,et al.  Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[53]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[54]  Roberto Scopigno,et al.  Recovering 3D existing-conditions of indoor structures from spherical images , 2018, Comput. Graph..

[55]  Roberto Scopigno,et al.  3D floor plan recovery from overlapping spherical images , 2018, Computational Visual Media.

[56]  Max Welling,et al.  Spherical CNNs , 2018, ICLR.

[57]  Kristen Grauman,et al.  Flat2Sphere: Learning Spherical Convolution for Fast Features from 360° Imagery , 2017, NIPS 2017.

[58]  Derek Hoiem,et al.  3D Manhattan Room Layout Reconstruction from a Single 360 Image , 2019, ArXiv.

[59]  Nassir Navab,et al.  Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images , 2018, ECCV.

[60]  Min Sun,et al.  Omnidirectional CNN for Visual Place Recognition and Navigation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[61]  Shugong Xu,et al.  Geometric Structure Based and Regularized Depth Estimation From 360 Indoor Imagery , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Krista A. Ehinger,et al.  Recognizing scene viewpoint using panoramic place representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Björn Stenger,et al.  Pano2CAD: Room Layout from a Single Panorama Image , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[65]  Javier Civera,et al.  Corners for Layout: End-to-End Layout Recovery From 360 Images , 2019, IEEE Robotics and Automation Letters.

[66]  Pascal Frossard,et al.  Graph-Based Classification of Omnidirectional Images , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[67]  Thiago L. T. da Silveira,et al.  Dense 3D Scene Reconstruction from Multiple Spherical Images for 3-DoF+ VR Applications , 2019, 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR).

[68]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[69]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.