Shape grammar parsing : application to image-based modeling

The purpose of this thesis was to perform facade image parsing with shape grammars in order to tackle single-view image-based 3D building modeling. The scope of the thesis was lying at the border of Computer Graphics and Computer Vision, both in terms of methods and applications.Two different and complementary approaches have been proposed: a bottom-up parsing algorithm that aimed at grouping similar regions of a facade image so as to retrieve the underlying layout, and a top-down parsing algorithm based on a very powerful framework: Reinforcement Learning. This novel parsing algorithm uses pixel-wise image supports based on supervised learning in a global optimization of a Markov Decision Process.Both methods were evaluated quantitatively and qualitatively. The second one was proved to support various architectures, several shape grammars and image supports, and showed robustness to challenging viewing conditions; illumination and large occlusions. The second method outperformed the state-of-the-art both in terms of segmentation and speed performances. It also provides a much more flexible framework, in which many extensions may be envisioned.The conclusion of this work was that the problem of single-view image-based 3D building modeling could be solved elegantly by using shape grammar as a Rosetta stone to decipher the language of Architecture through a well-suited Reinforcement Learning formulation. This solution was a potential answer to large-scale reconstruction of urban environments from images, but also suggested the possibility of introducing Reinforcement Learning in other vision tasks such as generic image parsing, where it have been barely explored so far.

[1]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Luc Van Gool,et al.  Image-based procedural modeling of facades , 2007, SIGGRAPH 2007.

[3]  Yanxi Liu,et al.  Translation-Symmetry-Based Perceptual Grouping with Applications to Urban Scenes , 2010, ACCV.

[4]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[5]  Long Zhu,et al.  Recursive Segmentation and Recognition Templates for 2D Parsing , 2008, NIPS.

[6]  Jonathan Cagan,et al.  Capturing a rebel: modeling the Harley-Davidson brand through a motorcycle shape grammar , 2002 .

[7]  Iasonas Kokkinos,et al.  Shape grammar parsing via Reinforcement Learning , 2011, CVPR 2011.

[8]  Christopher Rasmussen,et al.  Analysis of Building Textures for Reconstructing Partially Occluded Facades , 2008, ECCV.

[9]  Yanghai Tsin,et al.  The Promise and Perils of Near-Regular Texture , 2005 .

[10]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  A. Yakubenko,et al.  Automatic Extraction of Regular Grids from Rectified Facade Image , 2010 .

[12]  Marc Pollefeys,et al.  Interactive 3D architectural modeling from unordered photo collections , 2008, SIGGRAPH 2008.

[13]  Georgios Tziritas,et al.  Single view reconstruction using shape grammars for urban environments , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[15]  Yanxi Liu,et al.  Deformed Lattice Discovery Via Efficient Mean-Shift Belief Propagation , 2008, ECCV.

[16]  Luc Van Gool,et al.  Procedural 3D Reconstruction of Puuc Buildings in Xkipché , 2006, VAST.

[17]  D. Burr,et al.  A feature–based model of symmetry detection , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[18]  Soon Tee Teoh,et al.  Generalized Descriptions for the Procedural Modeling of Ancient East Asian Buildings , 2009, CAe.

[19]  Leonidas J. Guibas,et al.  Discovering structural regularity in 3D geometry , 2008, SIGGRAPH 2008.

[20]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[21]  Iasonas Kokkinos,et al.  HOP: Hierarchical object parsing , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Michael Wimmer,et al.  Interactive visual editing of grammars for procedural architecture , 2008, SIGGRAPH 2008.

[23]  Horst Bischof,et al.  Unsupervised Facade Segmentation Using Repetitive Patterns , 2010, DAGM-Symposium.

[24]  Manolis I. A. Lourakis,et al.  The design and implementation of a generic sparse bundle adjustment software package based on the Le , 2004 .

[25]  Andrew Zisserman,et al.  Geometric Grouping of Repeated Elements within Images , 1999, Shape, Contour and Grouping in Computer Vision.

[26]  George Stiny,et al.  Shape Grammars and the Generative Specification of Painting and Sculpture , 1971, IFIP Congress.

[27]  Pascal Müller,et al.  Procedural modeling of cities , 2001, SIGGRAPH.

[28]  Maarten Vergauwen,et al.  Image-based 3D acquisition of archaeological heritage and applications , 2001, VAST '01.

[29]  Philip H. S. Torr,et al.  VideoTrace: rapid interactive scene modelling from video , 2007, SIGGRAPH 2007.

[30]  Jan-Michael Frahm,et al.  Repetition-based dense single-view reconstruction , 2011, CVPR 2011.

[31]  Lihong Li,et al.  Prioritized Sweeping Converges to the Optimal Value Function , 2008 .

[32]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Claus Brenner,et al.  Application of a Formal Grammar to Facade Reconstruction in Semiautomatic and Automatic Environments , 2009 .

[34]  Jan-Olof Eklundh,et al.  Detecting Symmetry and Symmetric Constellations of Features , 2006, ECCV.

[35]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Yanxi Liu,et al.  A computational model for periodic pattern perception based on frieze and wallpaper groups , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Markus H. Gross,et al.  Interactive Geometric Simulation of 4D Cities , 2009, Comput. Graph. Forum.

[38]  W. Förstner,et al.  Detection of repeated structures in facade images , 2008, Pattern Recognition and Image Analysis.

[39]  Julien Perret,et al.  The FL-system: a functional L-system for procedural geometric modeling , 2005, The Visual Computer.

[40]  Richard Szeliski,et al.  Piecewise planar stereo for image-based rendering , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Florent Lafarge,et al.  Hybrid multi-view reconstruction by Jump-Diffusion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Reinhard Koch,et al.  Metric 3D Surface Reconstruction from Uncalibrated Image Sequences , 1998, SMILE.

[43]  Chun Liu,et al.  Image-based Modeling of Haussmannian Facades , 2010 .

[44]  Frédo Durand,et al.  A gentle introduction to bilateral filtering and its applications , 2007, SIGGRAPH Courses.

[45]  Luc Van Gool,et al.  Procedural modeling of buildings , 2006, ACM Trans. Graph..

[46]  Luc Van Gool,et al.  Efficient grouping under perspective skew , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[47]  Sudipta N. Sinha,et al.  REAL-TIME VIDEO-BASED RECONSTRUCTION OF URBAN ENVIRONMENTS , 2007 .

[48]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[49]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[50]  Joseph Schlecht,et al.  Inferring Grammar-based Structure Models from 3D Microscopy Data , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Jitendra Malik,et al.  Detecting, localizing and grouping repeated scene elements from an image , 1996, ECCV.

[52]  Nikos Paragios,et al.  Random Exploration of the Procedural Space for Single-View 3D Modeling of Buildings , 2011, International Journal of Computer Vision.

[53]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[55]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[56]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[57]  Jay McCormack,et al.  Speaking the Buick Language: Capturing, Understanding, and Exploring Brand Identity With Shape Grammars , 2004 .

[58]  Franz Leberl,et al.  Windows Detection Using K-means in CIE-Lab Color Space , 2010, 2010 20th International Conference on Pattern Recognition.

[59]  Yanxi Liu,et al.  Detecting and matching repeated patterns for automatic geo-tagging in urban environments , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  N. Mitra,et al.  Symmetry for Architectural Design , 2008 .

[61]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[62]  Yakup Genc,et al.  GPU-based Video Feature Tracking And Matching , 2006 .

[63]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[64]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[65]  Sven Havemann,et al.  3D Modeling for Non-Expert Users with the Castle Construction Kit v0.5 , 2005, VAST.

[66]  Maarten Vergauwen,et al.  A Hierarchical Symmetric Stereo Algorithm Using Dynamic Programming , 2002, International Journal of Computer Vision.

[67]  Nora Ripperda,et al.  DATA DRIVEN RULE PROPOSAL FOR GRAMMAR BASED FACADE RECONSTRUCTION , 2007 .

[68]  Luc Van Gool,et al.  Visual modelling: from images to images , 2002, Comput. Animat. Virtual Worlds.

[69]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  F. Durand,et al.  Procedural modeling of structurally-sound masonry buildings , 2009, SIGGRAPH 2009.

[71]  Luc Van Gool,et al.  Automatic reconstruction of roman housing architecture , 2006 .

[72]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[73]  Dinggang Shen,et al.  Automated Segmentation of 3D US Prostate Images Using Statistical Texture-Based Matching Method , 2003, MICCAI.

[74]  Werner Purgathofer,et al.  Symmetry-Based Façade Repair , 2009, VMV.

[75]  Nikos Paragios,et al.  Segmentation of building facades using procedural shape priors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[76]  Svetlana Lazebnik,et al.  Superparsing - Scalable Nonparametric Image Parsing with Superpixels , 2010, International Journal of Computer Vision.

[77]  Helmut Mayer,et al.  IMPLICIT SHAPE MODELS, MODEL SELECTION, AND PLANE SWEEPING FOR 3D FACADE INTERPRETATION , 2007 .

[78]  Christopher Rasmussen,et al.  Improving Spatiotemporal Inpainting with Layer Appearance Models , 2006, ISVC.

[79]  Zhuowen Tu,et al.  Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[80]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[81]  Daniel G. Aliaga,et al.  Building reconstruction using manhattan-world grammars , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[82]  Jan-Michael Frahm,et al.  Detecting Large Repetitive Structures with Salient Boundaries , 2010, ECCV.

[83]  Ramakant Nevatia,et al.  Extraction and integration of window in a 3D building model from ground view images , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[84]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.