OmniLayout: Room Layout Reconstruction from Indoor Spherical Panoramas

Given a single RGB panorama, the goal of 3D layout reconstruction is to estimate the room layout by predicting the corners, floor boundary, and ceiling boundary. A common approach has been to use standard convolutional networks to predict the corners and boundaries, followed by post-processing to generate the 3D layout. However, the space-varying distortions in panoramic images are not compatible with the translational equivariance property of standard convolutions, thus degrading performance. Instead, we propose to use spherical convolutions. The resulting network, which we call OmniLayout performs convolutions directly on the sphere surface, sampling according to inverse equirectangular projection and hence invariant to equirectangular distortions. Using a new evaluation metric, we show that our network reduces the error in the heavily distorted regions (near the poles) by ≈ 25% when compared to standard convolutional networks. Experimental results show that OmniLayout outperforms the state-of-the-art by ≈4% on two different benchmark datasets (PanoContext and Stanford 2D-3D). Code is available at https://github.com/rshivansh/OmniLayout.

[1]  Alan L. Yuille,et al.  Manhattan World: compass direction from a single image by Bayesian inference , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Svetlana Lazebnik,et al.  Learning Informative Edge Maps for Indoor Scene Layout Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Derek Hoiem,et al.  Predicting Complete 3D Models of Indoor Scenes , 2015, ArXiv.

[5]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Min Sun,et al.  HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Li Zhang,et al.  Physics Inspired Optimization on Semantic Transfer Features: An Alternative Method for Room Layout Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Björn Stenger,et al.  Pano2CAD: Room Layout from a Single Panorama Image , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Ricardo Cabral,et al.  Piecewise Planar and Compact Floorplan Reconstruction from Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Cheng Sun,et al.  HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andreas Geiger,et al.  SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images , 2018, ECCV.

[14]  Javier Civera,et al.  Corners for Layout: End-to-End Layout Recovery From 360 Images , 2019, IEEE Robotics and Automation Letters.

[15]  Li Yu,et al.  Noisy Student Training using Body Language Dataset Improves Facial Expression Recognition , 2020, ECCV Workshops.

[16]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Pierre Vandergheynst,et al.  Geodesic Convolutional Neural Networks on Riemannian Manifolds , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[18]  Kristen Grauman,et al.  Flat2Sphere: Learning Spherical Convolution for Fast Features from 360° Imagery , 2017, NIPS 2017.

[19]  Yinda Zhang,et al.  PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding , 2014, ECCV.

[20]  T. Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jian Zhang,et al.  Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[23]  Clara Fernandez-Labrador Indoor Scene Understanding using Non-Conventional Cameras. (Analyse de scènes intérieures à l'aide de caméras non conventionnelles) , 2020 .

[24]  C.-C. Jay Kuo,et al.  A Coarse-to-Fine Indoor Layout Estimation (CFILE) Method , 2016, ACCV.

[25]  Chen Liu,et al.  Layered Scene Decomposition via the Occlusion-CRF , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[27]  Derek Hoiem,et al.  LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Zhen Lin,et al.  Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network , 2018, NeurIPS.

[29]  Yang Wang,et al.  Non-Local Attentive Temporal Network for Video-Based Person Re-Identification , 2019, 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[30]  Wouter Boomsma,et al.  Spherical convolutions and their application in molecular modelling , 2017, NIPS.

[31]  Honglak Lee,et al.  A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32]  Peter Wonka,et al.  DuLa-Net: A Dual-Projection Network for Estimating Room Layouts From a Single RGB Panorama , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).