Hierarchical and spatial structures for interpreting images of man made scenes using graphical models

Hierarchical and Spatial Structures for Interpreting Images of Man-made Scenes Using Graphical Models The task of semantic scene interpretation is to label the regions of an image and their relations into meaningful classes. Such task is a key ingredient to many computer vision applications, including object recognition, 3D reconstruction and robotic perception. It is challenging partially due to the ambiguities inherent to the image data. The images of man-made scenes, e. g. the building facade images, exhibit strong contextual dependencies in the form of the spatial and hierarchical structures. Modelling these structures is central for such interpretation task. Graphical models provide a consistent framework for the statistical modelling. Bayesian networks and random fields are two popular types of the graphical models, which are frequently used for capturing such contextual information. The motivation for our work comes from the belief that we can find a generic formulation for scene interpretation that having both the benefits from random fields and Bayesian networks. It should have clear semantic interpretability. Therefore our key contribution is the development of a generic statistical graphical model for scene interpretation, which seamlessly integrates different types of the image features, and the spatial structural information and the hierarchical structural information defined over the multi-scale image segmentation. It unifies the ideas of existing approaches, e. g. conditional random field (CRF) and Bayesian network (BN), which has a clear statistical interpretation as the maximum a posteriori (MAP) estimate of a multi-class labelling problem. Given the graphical model structure, we derive the probability distribution of the model based on the factorization property implied in the model structure. The statistical model leads to an energy function that can be optimized approximately by either loopy belief propagation or graph cut based move making algorithm. The particular type of the features, the spatial structure, and the hierarchical structure however is not prescribed. In the experiments, we concentrate on terrestrial man-made scenes as a specifically difficult problem. We demonstrate the application of the proposed graphical model on the task of multi-class classification of building facade image regions. The framework for scene interpretation allows for significantly better classification results than the standard classical local classification approach on man-made scenes by incorporating the spatial and hierarchical structures. We investigate the performance of the algorithms on a public dataset to show the relative importance of the information from the spatial structure and the hierarchical structure. As a baseline for the region classification, we use an efficient randomized decision forest classifier. Two specific models are derived from the proposed graphical model, namely the hierarchical CRF and the hierarchical mixed graphical model. We show that these two models produce better classification results than both the baseline region classifier and the flat CRF. To my parents & my wife

[1]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Helmut Mayer,et al.  Automatic Object Extraction from Aerial Imagery - A Survey Focusing on Buildings , 1999, Comput. Vis. Image Underst..

[3]  Joachim Denzler,et al.  A Fast Approach for Pixelwise Labeling of Facade Images , 2010, 2010 20th International Conference on Pattern Recognition.

[4]  Ribana Roscher,et al.  IMPROVING IMAGE SEGMENTATION USING MULTIPLE VIEW ANALYSIS , 2009 .

[5]  Horst Bischof,et al.  Unsupervised Facade Segmentation Using Repetitive Patterns , 2010, DAGM-Symposium.

[6]  T. H. Kolbe,et al.  ON THE USE OF GEOMETRIC AND SEMANTIC MODELS FOR COMPONENT-BASED BUILDING RECONSTRUCTION , 1999 .

[7]  Martin Drauschke,et al.  An Irregular Pyramid for Multi-scale Analysis of Objects and Their Parts , 2009, GbRPR.

[8]  Kevin P. Murphy,et al.  Figure-ground segmentation using a hierarchical conditional random field , 2007, Fourth Canadian Conference on Computer and Robot Vision (CRV '07).

[9]  Martial Hebert,et al.  Man-made structure detection in natural images using a causal multiscale random field , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  H. Mayer,et al.  EVALUATION OF TEXTURE ENERGIES FOR CLASSIFICATION OF FACADE IMAGES , 2010 .

[11]  I. Lakatos PROOFS AND REFUTATIONS (I)*† , 1963, The British Journal for the Philosophy of Science.

[12]  Andrew McCallum,et al.  Piecewise Training for Undirected Models , 2005, UAI.

[13]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[14]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[16]  Lemonia Ragia,et al.  ON THE PERFORMANCE OF SEMI-AUTOMATIC BUILDING EXTRACTION , 1998 .

[17]  Richard S. Zemel,et al.  Learning and Incorporating Top-Down Cues in Image Segmentation , 2006, ECCV.

[18]  Wolfgang Förstner,et al.  A hierarchical conditional random field model for labeling and classifying images of man-made scenes , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[19]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Qiang Ji,et al.  Image Segmentation with a Unified Graphical Model , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Ben Taskar,et al.  Learning Sparse Markov Network Structure via Ensemble-of-Trees Models , 2009, AISTATS.

[22]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[23]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Wolfgang Förstner,et al.  A Bayesian Approach for Scene Interpretation with Integrated Hierarchical Structure , 2011, DAGM-Symposium.

[25]  Lin Yang,et al.  Multiple Class Segmentation Using A Unified Framework over Mean-Shift Patches , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[27]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from an Image , 2011, International Journal of Computer Vision.

[28]  Wolfgang Förstner,et al.  Approximate Parameter Learning in Conditional Random Fields: An Empirical Investigation , 2008, DAGM-Symposium.

[29]  John K. Tsotsos A ‘complexity level’ analysis of immediate vision , 2004, International Journal of Computer Vision.

[30]  Bernd Neumann,et al.  Learning a knowledge base of ontological concepts for high-level scene interpretation , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[31]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[32]  Tsuhan Chen,et al.  Learning class-specific affinities for image labelling , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Andrew C. Gallagher,et al.  Inference for order reduction in Markov random fields , 2011, CVPR 2011.

[34]  T. H. Kolbe,et al.  Integration of 2D and 3D reasoning for building reconstruction using a generic hierarchical model , 1997 .

[35]  Kevin Murphy,et al.  A brief introduction to graphical models and bayesian networks , 1998 .

[36]  Wolfgang Förstner,et al.  SEMI-SUPERVISED INCREMENTAL LEARNING OF HIERARCHICAL APPEARANCE MODELS , 2008 .

[37]  Mohannad Zalloom Loopy Belief Propagation , 2010, Encyclopedia of Machine Learning.

[38]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[39]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[40]  Wolfgang Förstner,et al.  Robust Wide Baseline Scene Alignment Based on 3D Viewpoint Normalization , 2010, ISVC.

[41]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[42]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[43]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[44]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[45]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Vladimir Kolmogorov,et al.  Minimizing Nonsubmodular Functions with Graph Cuts-A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Raphaël Marée,et al.  Random subwindows for robust image classification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[48]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[49]  Kim L. Boyer,et al.  Integration, Inference, and Management of Spatial Information Using Bayesian Networks: Perceptual Organization , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Qiang Ji,et al.  Probabilistic Image Modeling With an Extended Chain Graph for Human Activity Recognition and Image Segmentation , 2011, IEEE Transactions on Image Processing.

[51]  Anat Levin,et al.  Learning to Combine Bottom-Up and Top-Down Segmentation , 2006, ECCV.

[52]  Marc Toussaint,et al.  Multi-class image segmentation using conditional random fields and global classification , 2009, ICML '09.

[53]  Martial Hebert,et al.  Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[54]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.

[55]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[56]  Anat Levin,et al.  Learning to Combine Bottom-Up and Top-Down Segmentation , 2006, International Journal of Computer Vision.

[57]  Horst Bischof,et al.  Image-based Building Classification and 3D Modelling with Super-Pixels , 2010 .

[58]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[59]  Chun Yuan,et al.  Image segmentation based on Bayesian network-Markov random field model and its application to in vivo plaque composition , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[60]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[61]  Ansgar Brunn,et al.  Extracting Buildings from Digital Surface models , 1997 .

[62]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[63]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[64]  Bernt Schiele,et al.  Discriminative structure learning of hierarchical representations for object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Hiroshi Ishikawa,et al.  Higher-order clique reduction in binary graph cut , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Ben Taskar,et al.  Learning associative Markov networks , 2004, ICML.

[67]  Filip Korc INTERPRETING TERRESTRIAL IMAGES OF URBAN SCENES USING DISCRIMINATIVE RANDOM FIELDS , 2008 .

[68]  Roland Siegwart,et al.  Exploiting Repetitive Object Patterns for Model Compression and Completion , 2010, ECCV.

[69]  Roberto Cipolla,et al.  Modelling and Interpretation of Architecture from Several Images , 2004, International Journal of Computer Vision.

[70]  M. Frydenberg The chain graph Markov property , 1990 .

[71]  Jan-Michael Frahm,et al.  Detecting Large Repetitive Structures with Salient Boundaries , 2010, ECCV.

[72]  Maria Petrou,et al.  Image registration using the Walsh transform , 2006, IEEE Transactions on Image Processing.

[73]  Marc Pollefeys,et al.  Fast robust large-scale mapping from video and internet photo collections , 2010 .

[74]  Vladimir Kolmogorov,et al.  Comparison of Energy Minimization Algorithms for Highly Connected Graphs , 2006, ECCV.

[75]  Geoffrey E. Hinton,et al.  Learning Causally Linked Markov Random Fields , 2005, AISTATS.

[76]  W. Förstner,et al.  FEATURE EVALUATION FOR BUILDING FACADE IMAGES – AN EMPIRICAL STUDY , 2012 .

[77]  Qiang Ji,et al.  A Bayesian Network Model for Automatic and Interactive Image Segmentation , 2011, IEEE Transactions on Image Processing.

[78]  Gregory Gutin,et al.  Digraphs - theory, algorithms and applications , 2002 .

[79]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[80]  Anton Osokin,et al.  Fast Approximate Energy Minimization with Label Costs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[81]  Susanne Becker,et al.  Generation and application of rules for quality dependent façade reconstruction , 2009 .

[82]  Wolfgang Förstner,et al.  Hierarchical Conditional Random Field for Multi-class Image Classification , 2010, VISAPP.

[83]  Philip H. S. Torr,et al.  Efficient piecewise learning for conditional random fields , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[84]  Luc Van Gool,et al.  Procedural modeling of buildings , 2006, SIGGRAPH 2006.

[85]  Christopher K. I. Williams,et al.  Combining Belief Networks and Neural Networks for Scene Segmentation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[86]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[87]  Eric N. Mortensen,et al.  Real-Time Semi-Automatic Segmentation Using a Bayesian Network , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[88]  P. Pérez,et al.  Markov random fields and images , 1998 .

[89]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Claus Brenner,et al.  Towards Fully Automated 3D City Model Generation , 2001 .

[91]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[92]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[93]  Helmut Mayer,et al.  MCMC LINKED WITH IMPLICIT SHAPE MODELS AND PLANE SWEEPING FOR 3D BUILDING FACADE INTERPRETATION IN IMAGE SEQUENCES , 2006 .

[94]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[95]  Qiang Zhao,et al.  Decomposition of structural learning about directed acyclic graphs , 2006, Artif. Intell..

[96]  Armin Gruen,et al.  CyberCity Modeler, a tool for interactive 3-D city model generation , 1999 .

[97]  Jana Kosecka,et al.  Multi-view Superpixel Stereo in Urban Environments , 2010, International Journal of Computer Vision.

[98]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[99]  Bernt Schiele,et al.  Hierarchical Support Vector Random Fields: Joint Training to Combine Local and Global Features , 2008, ECCV.

[100]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[101]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[102]  Joost van de Weijer,et al.  Harmony potentials for joint classification and segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[103]  Eric P. Xing,et al.  Grafting-light: fast, incremental feature selection and structure learning of Markov random fields , 2010, KDD '10.

[104]  Filip Korc Tractable learning for a class of global discriminative models for context sensitive image interpretation , 2012 .

[105]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[106]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[107]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[108]  Wolfgang Förstner,et al.  Regionwise Classification of Building Facade Images , 2011, PIA.

[109]  Nikos Paragios,et al.  Segmentation of building facades using procedural shape priors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[110]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[111]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[112]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[113]  Georgios Tziritas,et al.  Single view reconstruction using shape grammars for urban environments , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[114]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[115]  R. Zabih,et al.  Efficient Graph-Based Energy Minimization Methods in Computer Vision , 1999 .

[116]  Wolfgang Förstner,et al.  Integration of conditional random fields and attribute grammars for range data interpretation of man-made objects , 2009, Ann. GIS.

[117]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[118]  Wolfgang Förstner,et al.  eTRIMS Image Database for Interpreting Images of Man-Made Scenes , 2009 .

[119]  H. Mayer,et al.  Building facade interprétation from uncalibrated wide-baseline image sequences , 2007 .

[120]  Ullrich Köthe,et al.  An Empirical Comparison of Inference Algorithms for Graphical Models with Higher Order Factors Using OpenGM , 2010, DAGM-Symposium.

[121]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[122]  藤重 悟 Submodular functions and optimization , 1991 .

[123]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[124]  Wolfgang Förstner,et al.  DETECTABILITY OF BUILDINGS IN AERIAL IMAGES OVER SCALE SPACE , 2006 .

[125]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[126]  Osamu Hasegawa,et al.  Random Field Model for Integration of Local Information and Global Information , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[127]  Johannes Hartz Automatic Incremental Model Learning for Scene Interpretation , 2009 .

[128]  Maria Petrou,et al.  Image processing - the fundamentals , 1999 .

[129]  John Trinder,et al.  Building detection by fusion of airborne laser scanner data and multi-spectral images : Performance evaluation and sensitivity analysis , 2007 .

[130]  Claus Brenner,et al.  Evaluation of Structure Recognition Using Labelled Facade Images , 2009, DAGM-Symposium.

[131]  T. Kanade,et al.  The 3D MOSAIC scene understanding system: incremental reconstruction of 3D scenes for complex images , 1987 .

[132]  Pushmeet Kohli,et al.  Graph Cut Based Inference with Co-occurrence Statistics , 2010, ECCV.

[133]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[134]  Jun Zhang,et al.  A Markov random field model-based approach to image interpretation , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[135]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[136]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[137]  Pushmeet Kohli,et al.  Exact and Approximate Inference in Associative Hierarchical Networks using Graph Cuts , 2010, UAI.

[138]  KohliPushmeet,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2009 .

[139]  R. B. Potts Some generalized order-disorder transformations , 1952, Mathematical Proceedings of the Cambridge Philosophical Society.

[140]  Michael Ying Yang,et al.  Fusion of camera images and laser scans for wide baseline 3D scene alignment in urban environments , 2011 .

[141]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[142]  Jana Kosecka,et al.  Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[143]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.