Learning to Predict High-Quality Edge Maps for Room Layout Estimation

The goal of room layout estimation is to predict the three-dimensional box that represents the room spatial structure from a monocular image. In this paper, a deconvolution network is trained first to predict the edge map of a room image. Compared to the previous fully convolutional networks, the proposed deconvolution network has a multilayer deconvolution process that can refine the edge map estimate layer by layer. The deconvolution network also has fully connected layers to aggregate the information of every region throughout the entire image. During the layout generation process, an adaptive sampling strategy is introduced based on the obtained high-quality edge maps. Experimental results prove that the learned edge maps are highly reliable and can produce accurate layouts of room images.

[1]  Silvio Savarese,et al.  DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Dacheng Tao,et al.  Robust Face Recognition via Multimodal Deep Face Representation , 2015, IEEE Transactions on Multimedia.

[3]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[4]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[7]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[8]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[9]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[10]  Jason Gu,et al.  Wi-Fi positioning based on deep learning , 2014, 2014 IEEE International Conference on Information and Automation (ICIA).

[11]  Jaishanker K. Pillai,et al.  Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[14]  Sanja Fidler,et al.  Rent3D: Floor-plan priors for monocular layout estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kobus Barnard,et al.  Understanding Bayesian Rooms Using Composite 3D Object Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Lin Ma,et al.  Learning structure of stereoscopic image for no-reference quality assessment with convolutional neural network , 2016, Pattern Recognit..

[17]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[18]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[19]  Abhinav Gupta,et al.  Designing deep networks for surface normal estimation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Takeo Kanade,et al.  Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces , 2010, NIPS.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Marc Pollefeys,et al.  Efficient structured prediction for 3D indoor scene understanding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Song-Chun Zhu,et al.  Scene Parsing by Integrating Function, Geometry and Appearance Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Raquel Urtasun,et al.  Efficient Exact Inference for 3D Indoor Scene Understanding , 2012, ECCV.

[29]  C.-C. Jay Kuo,et al.  A Coarse-to-Fine Indoor Layout Estimation (CFILE) Method , 2016, ACCV.

[30]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Daniel Fried,et al.  Bayesian geometric modeling of indoor scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Stephen Gould,et al.  Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding , 2010, ECCV.

[33]  Takeo Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, CVPR.

[34]  Svetlana Lazebnik,et al.  Learning Informative Edge Maps for Indoor Scene Layout Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).