Room Layout Estimation with Object and Material Attributes Information Using a Spherical Camera

In this paper we propose a pipeline for estimating 3D room layout with object and material attribute prediction using a spherical stereo image pair. We assume that the room and objects can be represented as cuboids aligned to the main axes of the room coordinate (Manhattan world). A spherical stereo alignment algorithm is proposed to align two spherical images to the global world coordinate system. Depth information of the scene is estimated by stereo matching between images. Cubic projection images of the spherical RGB and estimated depth are used for object and material attribute detection. A single Convolutional Neural Network is designed to assign object and attribute labels to geometrical elements built from the spherical image. Finally simplified room layout is reconstructed by cuboid fitting. The reconstructed cuboid-based model shows the structure of the scene with object information and material attributes.

[1]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Marc Pollefeys,et al.  Interactive 3D architectural modeling from unordered photo collections , 2008, SIGGRAPH 2008.

[4]  Richard Szeliski,et al.  Reconstructing building interiors from images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Alexei A. Efros,et al.  Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[8]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Carsten Rother,et al.  Dense Semantic Image Segmentation with Objects and Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[12]  Andreas Geiger,et al.  Omnidirectional 3D reconstruction in augmented Manhattan worlds , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Gang Wang,et al.  Unsupervised Joint Feature Learning and Encoding for RGB-D Scene Labeling , 2015, IEEE Transactions on Image Processing.

[14]  Shi-Min Hu,et al.  3D indoor scene modeling from RGB-D data: a survey , 2015, Computational Visual Media.

[15]  Richard Szeliski,et al.  Manhattan-world stereo , 2009, CVPR.

[16]  Feng Wu,et al.  Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Shigang Li Real-Time Spherical Stereo , 2006, ICPR.

[18]  Jitendra Malik,et al.  Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation , 2015, International Journal of Computer Vision.

[19]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Olaf Kähler,et al.  Efficient 3D Scene Labeling Using Fields of Trees , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Vladlen Koltun,et al.  Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jianxiong Xiao,et al.  Reconstructing the World’s Museums , 2012, International Journal of Computer Vision.

[23]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Adrian Hilton,et al.  3D Scene Reconstruction from Multiple Spherical Stereo Pairs , 2013, International Journal of Computer Vision.

[25]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[26]  Jianfei Cai,et al.  Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation , 2015, J. Vis. Commun. Image Represent..

[27]  Jiri Matas,et al.  Robust Detection of Lines Using the Progressive Probabilistic Hough Transform , 2000, Comput. Vis. Image Underst..

[28]  Allan Zelener Survey of Object Classification in 3 D Range Scans , 2014 .

[29]  Adrian Hilton,et al.  Block world reconstruction from spherical stereo image pairs , 2015, Comput. Vis. Image Underst..

[30]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[31]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.