Estimating Generic 3D Room Structures from 2D Annotations

Indoor rooms are among the most common use cases in 3D scene understanding. Current state-of-the-art methods for this task are driven by large annotated datasets. Room layouts are especially important, consisting of structural elements in 3D, such as wall, floor, and ceiling. However, they are difficult to annotate, especially on pure RGB video. We propose a novel method to produce generic 3D room layouts just from 2D segmentation masks, which are easy to annotate for humans. Based on these 2D annotations, we automatically reconstruct 3D plane equations for the structural elements and their spatial extent in the scene, and connect adjacent elements at the appropriate contact edges. We annotate and publicly release 2266 3D room layouts on the RealEstate10k dataset, containing YouTube videos. We demonstrate the high quality of these 3D layouts annotations with extensive experiments.

[1]  Sergey Zakharov,et al.  Photo-realistic Neural Domain Randomization , 2022, ECCV.

[2]  Leon L. Xu,et al.  ABO: Dataset and Benchmarks for Real-World 3D Object Understanding , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Patrick Labatut,et al.  Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  S. B. Kang,et al.  Zillow Indoor Dataset: Annotated Floor Plans With 360° Panoramas and 3D Room Layouts , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Vladlen Koltun,et al.  Enhancing Photorealism Enhancement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Peter Wonka,et al.  Manhattan Room Layout Reconstruction from a Single $360^{\circ }$ Image: A Comparative Study of State-of-the-Art Methods , 2021, International Journal of Computer Vision.

[7]  Matthias Grundmann,et al.  Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Vittorio Ferrari,et al.  Vid2CAD: CAD Model Alignment Using Multi-View Constraints From Videos , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Marc Pollefeys,et al.  DeepVideoMVS: Multi-View Stereo on Video with Recurrent Spatio-Temporal Fusion , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Peng Liu,et al.  3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Lin Gao,et al.  3D-FUTURE: 3D Furniture Shape with TextURE , 2020, International Journal of Computer Vision.

[12]  Enrico Gobbetti,et al.  AtlantaNet: Inferring the 3D Indoor Layout from a Single $360^\circ $ Image Beyond the Manhattan World Assumption , 2020, ECCV.

[13]  Yinda Zhang,et al.  GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes , 2020, ECCV.

[14]  Pablo Bauszat,et al.  CoReNet: Coherent 3D scene reconstruction from a single RGB image , 2020, ECCV.

[15]  Angela Dai,et al.  SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans , 2020, ECCV.

[16]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[17]  V. Lepetit,et al.  General 3D Room Layout from a Single View by Render-and-Compare , 2020, ECCV.

[18]  Jordi Pont-Tuset,et al.  Connecting Vision and Language with Localized Narratives , 2019, ECCV.

[19]  Zihan Zhou,et al.  Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling , 2019, ECCV.

[20]  Michael Goesele,et al.  The Replica Dataset: A Digital Replica of Indoor Spaces , 2019, ArXiv.

[21]  Cheng Sun,et al.  HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jan Kautz,et al.  PlaneRCNN: 3D Plane Detection and Reconstruction From a Single Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Eirikur Agustsson,et al.  Interactive Full Image Segmentation by Considering All Regions Jointly , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Peter Wonka,et al.  DuLa-Net: A Dual-Projection Network for Estimating Room Layouts From a Single RGB Panorama , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Matthias Nießner,et al.  Scan2CAD: Learning CAD Model Alignment in RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Qi-Xing Huang,et al.  Domain Transfer Through Deep Activation Matching , 2018, ECCV.

[27]  Wenbin Li,et al.  InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[28]  Shang-Hong Lai,et al.  Indoor Scene Layout Estimation from a Single Image , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[29]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  John Flynn,et al.  Stereo magnification , 2018, ACM Trans. Graph..

[31]  Varun Jampani,et al.  Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Jimei Yang,et al.  PlaneNet: Piece-Wise Planar Reconstruction from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Jiajun Wu,et al.  Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Nassir Navab,et al.  Guide Me: Interacting with Deep Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Derek Hoiem,et al.  LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Yuandong Tian,et al.  Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[37]  Luc Van Gool,et al.  Deep Extreme Cut: From Extreme Points to Object Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Stefan Leutenegger,et al.  SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[40]  Marc Pollefeys,et al.  Indoor Scan2BIM: Building information models of house interiors , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Tomasz Malisiewicz,et al.  RoomNet: End-to-End Room Layout Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[46]  Duc Thanh Nguyen,et al.  SceneNN: A Scene Meshes Dataset with aNNotations , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[47]  Björn Stenger,et al.  Pano2CAD: Room Layout from a Single Panorama Image , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[48]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[51]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[53]  Yinda Zhang,et al.  PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding , 2014, ECCV.

[54]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Ricardo Cabral,et al.  Piecewise Planar and Compact Floorplan Reconstruction from Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[57]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[58]  Antonio Torralba,et al.  Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[59]  Kobus Barnard,et al.  Understanding Bayesian Rooms Using Composite 3D Object Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  David A. Forsyth,et al.  Recovering free space of indoor scenes from a single image , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Daniel Fried,et al.  Bayesian geometric modeling of indoor scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Jiebo Luo,et al.  iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[64]  Toby Sharp,et al.  Image segmentation with a bounding box prior , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[65]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[66]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[67]  Nitesh B. Gundavarapu,et al.  Supplementary Material OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets , 2021 .

[68]  Jie Zhou,et al.  Structural Deep Metric Learning for Room Layout Estimation , 2020, ECCV.

[69]  I. Sobol On the distribution of points in a cube and the approximate evaluation of integrals , 1967 .