AtlantaNet: Inferring the 3D Indoor Layout from a Single $360^\circ $ Image Beyond the Manhattan World Assumption

We introduce a novel end-to-end approach to predict a 3D room layout from a single panoramic image. Compared to recent stateof-the-art works, our method is not limited to Manhattan World environments, and can reconstruct rooms bounded by vertical walls that do not form right angles or are curved – i.e., Atlanta World models. In our approach, we project the original gravity-aligned panoramic image on two horizontal planes, one above and one below the camera. This representation encodes all the information needed to recover the Atlanta World 3D bounding surfaces of the room in the form of a 2D room footprint on the floor plan and a room height. To predict the 3D layout, we propose an encoder-decoder neural network architecture, leveraging Recurrent Neural Networks (RNNs) to capture long-range geometric patterns, and exploiting a customized training strategy based on domain-specific knowledge. The experimental results demonstrate that our method outperforms state-of-the-art solutions in prediction accuracy, in particular in cases of complex wall layouts or curved wall footprints.

[1]  Enrico Gobbetti,et al.  State‐of‐the‐art in Automatic 3D Reconstruction of Structured Indoor Environments , 2020, Comput. Graph. Forum.

[2]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[3]  Cheng Sun,et al.  HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  F. Dellaert,et al.  Atlanta world: an expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments , 2004, CVPR 2004.

[5]  T. Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Javier Civera,et al.  Corners for Layout: End-to-End Layout Recovery From 360 Images , 2019, IEEE Robotics and Automation Letters.

[8]  Kostas Daniilidis,et al.  A Unifying Theory for Central Panoramic Systems and Practical Applications , 2000, ECCV.

[9]  Enrico Gobbetti,et al.  Omnidirectional image capture on mobile devices for fast automatic generation of 2.5D indoor maps , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10]  Hui Zhang,et al.  Efficient 3D Room Shape Recovery from a Single Panorama , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[12]  Peter Wonka,et al.  DuLa-Net: A Dual-Projection Network for Estimating Room Layouts From a Single RGB Panorama , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Peter Wonka,et al.  PanoAnnotator: a semi-automatic tool for indoor panorama layout annotation , 2018, SIGGRAPH ASIA Posters.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  David H. Douglas,et al.  ALGORITHMS FOR THE REDUCTION OF THE NUMBER OF POINTS REQUIRED TO REPRESENT A DIGITIZED LINE OR ITS CARICATURE , 1973 .

[16]  Roberto Scopigno,et al.  Recovering 3D existing-conditions of indoor structures from spherical images , 2018, Comput. Graph..

[17]  Frank Dellaert,et al.  Atlanta world: an expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[19]  Derek Hoiem,et al.  LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Sanja Fidler,et al.  Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Ian D. Reid,et al.  A Dynamic Programming Approach to Reconstructing Building Interiors , 2010, ECCV.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Peter Wonka,et al.  3D Manhattan Room Layout Reconstruction from a Single 360 Image , 2019, ArXiv.

[24]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[25]  Yinda Zhang,et al.  PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding , 2014, ECCV.

[26]  Andrew C. Gallagher Using vanishing points to correct camera rotation in images , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[27]  Sanja Fidler,et al.  Annotating Object Instances with a Polygon-RNN , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Ian D. Reid,et al.  Manhattan scene understanding using monocular, stereo, and 3D features , 2011, 2011 International Conference on Computer Vision.

[30]  Shi Jin,et al.  Automatic 3D Indoor Scene Modeling from Single Panorama , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Björn Stenger,et al.  Pano2CAD: Room Layout from a Single Panorama Image , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Honglak Lee,et al.  A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).