Efficient Structured Parsing of Facades Using Dynamic Programming

We propose a sequential optimization technique for segmenting a rectified image of a facade into semantic categories. Our method retrieves a parsing which respects common architectural constraints and also returns a certificate for global optimality. Contrasting the suggested method, the considered facade labeling problem is typically tackled as a classification task or as grammar parsing. Both approaches are not capable of fully exploiting the regularity of the problem. Therefore, our technique very significantly improves the accuracy compared to the state-of-the-art while being an order of magnitude faster. In addition, in 85% of the test images we obtain a certificate for optimality.

[1]  Qinping Zhao,et al.  Rectilinear parsing of architecture in urban environment , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Andreas Wendel,et al.  Façade Segmentation in a Multi-view Scenario , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[3]  Jon Bentley,et al.  Programming pearls: algorithm design techniques , 1984, CACM.

[4]  Marc Pollefeys,et al.  Efficient structured prediction for 3D indoor scene understanding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[6]  Jon Louis Bentley Programming pearls: perspective on performance , 1984, CACM.

[7]  Stephen Gould,et al.  Discriminative learning with latent variables for cluttered indoor scene understanding , 2010, CACM.

[8]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[10]  Christoph H. Lampert,et al.  Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Alexei A. Efros,et al.  Scene Semantics from Long-Term Observation of People , 2012, ECCV.

[12]  George Stiny,et al.  Pictorial and Formal Aspects of Shape and Shape Grammars , 1975 .

[13]  Carl Olsson,et al.  Optimal Estimation of Perspective Camera Pose , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[14]  Raquel Urtasun,et al.  Efficient Exact Inference for 3D Indoor Scene Understanding , 2012, ECCV.

[15]  L. Van Gool,et al.  AUTOMATIC ARCHITECTURAL STYLE RECOGNITION , 2012 .

[16]  Pushmeet Kohli,et al.  Associative Hierarchical Random Fields , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18]  Tomás Werner,et al.  A Linear Programming Approach to Max-Sum Problem: A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Iasonas Kokkinos,et al.  Shape grammar parsing via Reinforcement Learning , 2011, CVPR 2011.

[20]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[21]  Nikos Paragios,et al.  Segmentation of building facades using procedural shape priors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[23]  Takeo Kanade,et al.  Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces , 2010, NIPS.

[24]  Jianxiong Xiao,et al.  Image-based street-side city modeling , 2009, SIGGRAPH 2009.

[25]  Sanja Fidler,et al.  Box in the Box: Joint 3D Layout and Object Reasoning from Single Images , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Marc Pollefeys,et al.  Handling Urban Location Recognition as a 2D Homothetic Problem , 2010, ECCV.

[27]  Svetha Venkatesh,et al.  Efficient algorithms for subwindow search in object detection and localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Wolfgang Förstner,et al.  eTRIMS Image Database for Interpreting Images of Man-Made Scenes , 2009 .

[30]  Richard I. Hartley,et al.  Global Optimization through Rotation Space Search , 2009, International Journal of Computer Vision.

[31]  Thomas Vetter,et al.  Optimal landmark detection using shape models and branch and bound , 2011, 2011 International Conference on Computer Vision.

[32]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, ACM Trans. Graph..

[33]  Luc Van Gool,et al.  A Three-Layered Approach to Facade Parsing , 2012, ECCV.

[34]  Olga Veksler,et al.  Tiered scene labeling with dynamic programming , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[36]  Jianxiong Xiao,et al.  Image-based street-side city modeling , 2009, ACM Trans. Graph..