论文信息 - 3D Room Layout Estimation From a Single RGB Image

3D Room Layout Estimation From a Single RGB Image

3D layout is crucial for scene understanding and reconstruction, and very useful in applications like real estate and furniture design. In this paper, we propose a fully automatic solution to estimate 3D layout of an indoor scene from a single 2D image. Our technique contains two key components. Firstly, we train a neural network that directly estimates room structure lines from the input image. Secondly, we propose a novel technique to automatically identify the layout topology of an input image, followed by a nonlinear optimization with equality constraints to estimate the final 3D layout of a scene. Based on our knowledge, this is the first fully automatic technique to achieve single image-based 3D layout estimation of an indoor scene. We evaluate our method on the public datasets <inline-formula><tex-math notation="LaTeX">$LSUN$</tex-math></inline-formula>, <inline-formula><tex-math notation="LaTeX">$Hedau$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$3DGP$</tex-math></inline-formula> and the results show that the proposed method achieves accurate 3D layout reconstruction on various images with different layout topologies.

[1] Zhihai He,et al. Task-Driven Progressive Part Localization for Fine-Grained Object Recognition , 2016, IEEE Transactions on Multimedia.

[2] Stefano Soatto,et al. Shape and Radiance Estimation from the Information-Divergence of Blurred Images , 2000, ECCV.

[3] Yongdong Zhang,et al. A Fast Uyghur Text Detector for Complex Background Images , 2018, IEEE Transactions on Multimedia.

[4] Ian D. Reid,et al. Single View Metrology , 2000, International Journal of Computer Vision.

[5] Honglak Lee,et al. Automatic Single-Image 3d Reconstructions of Indoor Manhattan World Scenes , 2007, ISRR.

[6] Kobus Barnard,et al. Understanding Bayesian Rooms Using Composite 3D Object Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Guo-Jun Qi,et al. Hierarchically Gated Deep Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Zihan Zhou,et al. Detecting Dominant Vanishing Points in Natural Scenes with Application to Composition-Sensitive Image Retrieval , 2016, IEEE Transactions on Multimedia.

[9] Matthias Nießner,et al. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Silvio Savarese,et al. DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Marc Pollefeys,et al. Efficient structured prediction for 3D indoor scene understanding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Chenggang Yan,et al. Deep Multi-View Enhancement Hashing for Image Retrieval , 2020, IEEE transactions on pattern analysis and machine intelligence.

[13] Mei Han,et al. Interactive construction of 3D models from panoramic mosaics , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[14] Jianxiong Xiao,et al. SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[16] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Eitan Marder-Eppstein,et al. Project Tango , 2016, SIGGRAPH Real-Time Live!.

[18] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Jason Jianjun Gu,et al. Learning to Predict High-Quality Edge Maps for Room Layout Estimation , 2017, IEEE Transactions on Multimedia.

[20] Stephen Gould,et al. Discriminative learning with latent variables for cluttered indoor scene understanding , 2010, CACM.

[21] Jinhui Tang,et al. RGB-D Object Recognition via Incorporating Latent Data Structure and Prior Knowledge , 2015, IEEE Transactions on Multimedia.

[22] Jitendra Malik,et al. Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[23] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[24] Frank Dellaert,et al. Atlanta world: an expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[25] Derek Hoiem,et al. LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Silvio Savarese,et al. Understanding Indoor Scenes Using 3D Geometric Phrases , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Yongdong Zhang,et al. Convolutional Attention Networks for Scene Text Recognition , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[28] Ersin Yumer,et al. Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Yaser Sheikh,et al. Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Takeo Kanade,et al. Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[31] Stephen J. Maybank,et al. A Method for Interactive 3D Reconstruction of Piecewise Planar Objects from Single Images , 1999, BMVC.

[32] Avinash C. Kak,et al. Fast Vision-guided Mobile Robot Navigation Using Model-based Reasoning And Prediction Of Uncertainties , 1992, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33] Derek Hoiem,et al. Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34] Alan L. Yuille,et al. Manhattan World: compass direction from a single image by Bayesian inference , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[35] Feng Han,et al. Bayesian reconstruction of 3D shapes and scenes from a single image , 2003, First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003. HLK 2003..

[36] Alexei A. Efros,et al. Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[37] Yongdong Zhang,et al. Double-Bit Quantization and Index Hashing for Nearest Neighbor Search , 2019, IEEE Transactions on Multimedia.

[38] Yinda Zhang,et al. PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding , 2014, ECCV.

[39] Svetlana Lazebnik,et al. Learning Informative Edge Maps for Indoor Scene Layout Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40] Qionghai Dai,et al. Cross-Modality Bridging and Knowledge Transferring for Image Understanding , 2019, IEEE Transactions on Multimedia.

[41] Yongdong Zhang,et al. STAT: Spatial-Temporal Attention Mechanism for Video Captioning , 2020, IEEE Transactions on Multimedia.

[42] Antonio Torralba,et al. Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[43] James M. Coughlan,et al. Manhattan World: Orientation and Outlier Detection by Bayesian Inference , 2003, Neural Computation.

[44] Steven M. Seitz,et al. Interactive Room Capture on 3D-Aware Mobile Devices , 2017, UIST.

[45] Li Zhang,et al. Physics Inspired Optimization on Semantic Transfer Features: An Alternative Method for Room Layout Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Zhu Liu,et al. Joint scene classification and segmentation based on hidden Markov model , 2005, IEEE Transactions on Multimedia.

[47] Ping-Sing Tsai,et al. Shape from Shading: A Survey , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[48] Daniel Fried,et al. Bayesian geometric modeling of indoor scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Yongdong Zhang,et al. Automated pulmonary nodule detection in CT images using deep convolutional neural networks , 2019, Pattern Recognit..

[50] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[51] Takeo Kanade,et al. Appearance-based virtual view generation from multicamera videos captured in the 3-D room , 2003, IEEE Trans. Multim..

[52] Tomasz Malisiewicz,et al. RoomNet: End-to-End Room Layout Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).