Inferring 3D layout of building facades from a single image

In this paper, we propose a novel algorithm that infers the 3D layout of building facades from a single 2D image of an urban scene. Different from existing methods that only yield coarse orientation labels or qualitative block approximations, our algorithm quantitatively reconstructs building facades in 3D space using a set of planes mutually related by 3D geometric constraints. Each plane is characterized by a continuous orientation vector and a depth distribution. An optimal solution is reached through inter-planar interactions. Due to the quantitative and plane-based nature of our geometric reasoning, our model is more expressive and informative than existing approaches. Experiments show that our method compares competitively with the state of the art on both 2D and 3D measures, while yielding a richer interpretation of the 3D scene behind the image.

[1]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[2]  Matthew Brand,et al.  Lifting 3D Manhattan Lines from a Single Image , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Song-Chun Zhu,et al.  Scene Parsing by Integrating Function, Geometry and Appearance Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Sanja Fidler,et al.  Box in the Box: Joint 3D Layout and Object Reasoning from Single Images , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Robert T. Collins,et al.  Vanishing point calculation as a statistical inference on the unit sphere , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[6]  Silvio Savarese,et al.  Understanding Indoor Scenes Using 3D Geometric Phrases , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from an Image , 2011, International Journal of Computer Vision.

[8]  Tom Heskes,et al.  Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy , 2002, NIPS.

[9]  Wei Zhang,et al.  Video Compass , 2002, ECCV.

[10]  Takeo Kanade,et al.  Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces , 2010, NIPS.

[11]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Ce Liu,et al.  Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Alexei A. Efros,et al.  Closing the loop in scene interpretation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[15]  Beatrice Brillault-O'Mahony,et al.  New method for vanishing point detection , 1991, CVGIP Image Underst..

[16]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  David A. Forsyth,et al.  Recovering free space of indoor scenes from a single image , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Seth J. Teller,et al.  Automatic recovery of relative camera rotations for urban scenes , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[19]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[20]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[21]  T. Heskes Stable Fixed Points of Loopy Belief Propagation Are Minima of the Bethe Free Energy , 2002 .

[22]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  T. Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Martial Hebert,et al.  Stacked Hierarchical Labeling , 2010, ECCV.

[25]  Ashutosh Saxena,et al.  Learning 3-D Scene Structure from a Single Still Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.