Models for learning spatial interactions in natural images

Classification of various image components (pixels, regions and objects) in meaningful categories is a challenging task due to ambiguities inherent to visual data. Natural images exhibit strong contextual dependencies in the form of spatial interactions among components. For example, neighboring pixels tend to have similar class labels, and different parts of an object are related through geometric constraints. Going beyond these, different regions e.g., sky and water, or objects e.g., monitor and keyboard appear in restricted spatial configurations. Modeling these interactions is crucial to achieve good classification accuracy. In this thesis, we present discriminative field models that capture spatial interactions in images in a discriminative framework based on the concept of Conditional Random Fields proposed by Lafferty et al. The discriminative fields offer several advantages over the Markov Random Fields (MRFs) popularly used in computer vision. First, they allow to capture arbitrary dependencies in the observed data by relaxing the restrictive assumption of conditional independence generally made in MRFs for tractability. Second, the interaction in labels in discriminative fields is based on the observed data, instead of being fixed a priori as in MRFs. This is critical to incorporate different types of context in images within a single framework. Finally, the discriminative fields derive their classification power by exploiting probabilistic discriminative models instead of the generative models used in MRFs. Since the graphs induced by the discriminative fields may have arbitrary topology, exact maximum likelihood parameter learning may not be feasible. We present an approach which approximates the gradients of the likelihood with simple piecewise constant functions constructed using inference techniques. To exploit different levels of contextual information in images, a two-layer hierarchical formulation is also described. It encodes both short-range interactions (e.g., pixelwise label smoothing) as well as long-range interactions (e.g., relative configurations of objects or regions) in a tractable manner. The models proposed in this thesis are general enough to be applied to several challenging computer vision tasks such as contextual object detection, semantic scene segmentation, texture recognition, and image denoising seamlessly within a single framework.

[1]  Journal of the Optical Society of America , 1950, Nature.

[2]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[3]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[4]  Jerome A. Feldman,et al.  A Semantics-Based Decision Theory Region Analyser , 1973, IJCAI.

[5]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[6]  David G. Stork,et al.  Pattern Classification , 1973 .

[7]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[8]  T. Garvey Perceptual strategies for purposive vision , 1975 .

[9]  Azriel Rosenfeld,et al.  Scene Labeling by Relaxation Operations , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Yuichi Ohta A Region-Oriented Image-Analysis System by Computer , 1980 .

[12]  大田 友一,et al.  A region-oriented image-analysis system by computer , 1980 .

[13]  Philip E. Gill,et al.  Practical optimization , 1981 .

[14]  Robert L. Haar,et al.  Sketching: Estimating object positions from relational descriptions , 1982, Comput. Graph. Image Process..

[15]  Anil K. Jain,et al.  Markov Random Field Texture Models , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Martin D. Levine,et al.  Rule-based image segmentation: A dynamic control strategy approach , 1985, Comput. Vis. Graph. Image Process..

[18]  J. Kittler,et al.  RELAXATION LABELING ALGORITHMS - A REVIEW , 1985 .

[19]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[20]  Josef Kittler,et al.  Contextual Pattern Recognition Applied to Cloud Detection and Identification , 1985, IEEE Transactions on Geoscience and Remote Sensing.

[21]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[22]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[23]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[24]  Ramesh C. Jain,et al.  Knowledge representation and control in computer vision systems , 1988, IEEE Expert.

[25]  Takeo Kanade,et al.  Automatic generation of object recognition programs , 1988, Proc. IEEE.

[26]  Josef Kittler,et al.  Combining Evidence in Probabilistic Relaxation , 1989, Int. J. Pattern Recognit. Artif. Intell..

[27]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[28]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[29]  Federico Girosi,et al.  Parallel and Deterministic Algorithms from MRFs: Surface Reconstruction , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Thomas M. Strat,et al.  Natural Object Recognition , 1992, Springer Series in Perception Engineering.

[31]  Chee Sun Won,et al.  Unsupervised segmentation of noisy and textured images using Markov random fields , 1992, CVGIP Graph. Model. Image Process..

[32]  Alistair Sinclair,et al.  Algorithms for Random Generation and Counting: A Markov Chain Approach , 1993, Progress in Theoretical Computer Science.

[33]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[34]  Charles A. Bouman,et al.  A multiscale random field model for Bayesian image segmentation , 1994, IEEE Trans. Image Process..

[35]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[36]  Peter C. Doerschuk,et al.  Tree Approximations to Markov Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  William J. Christmas,et al.  Structural Matching in Computer Vision Using Probabilistic Relaxation , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Rama Chellappa,et al.  Delineating buildings by grouping lines with MRFs , 1996, IEEE Trans. Image Process..

[39]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[40]  Josiane Zerubia,et al.  A Hierarchical Markov Random Field Model and Multitemperature Annealing for Parallel Image Classification , 1996, CVGIP Graph. Model. Image Process..

[41]  Richard Lepage,et al.  Knowledge-Based Image Understanding Systems: A Survey , 1997, Comput. Vis. Image Underst..

[42]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[43]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[44]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Josef Kittler,et al.  Probabilistic Relaxation: Potential, Relationships and Open Problems , 1997, EMMCVPR.

[46]  Paul A. Viola,et al.  A Non-Parametric Multi-Scale Statistical Model for Natural Images , 1997, NIPS.

[47]  Christopher K. I. Williams,et al.  DTs: Dynamic Trees , 1998, NIPS.

[48]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[49]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[50]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[51]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Ramakant Nevatia,et al.  Building Detection and Description from a Single Intensity Image , 1998, Comput. Vis. Image Underst..

[53]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[54]  Anil K. Jain,et al.  On image classification: city images vs. landscapes , 1998, Pattern Recognit..

[55]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[56]  Helmut Mayer,et al.  Automatic Object Extraction from Aerial Imagery - A Survey Focusing on Buildings , 1999, Comput. Vis. Image Underst..

[57]  Jake K. Aggarwal,et al.  Applying perceptual grouping to content-based image retrieval: building images , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[58]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[59]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[60]  M. Langer Large-scale failures of f -α scaling in natural image spectra , 2000 .

[61]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[62]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[63]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[64]  Wojciech Pieczynski,et al.  Pairwise Markov random fields and its application in textured images segmentation , 2000, 4th IEEE Southwest Symposium on Image Analysis and Interpretation.

[65]  Alan L. Yuille,et al.  Statistical cues for domain specific image segmentation with performance analysis , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[66]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[67]  Joan Batlle,et al.  A review on strategies for recognizing natural objects in colour images of outdoor scenes , 2000, Image Vis. Comput..

[68]  Sudeep Sarkar,et al.  Supervised Learning of Large Perceptual Organization: Graph Spectral Partitioning and Learning Automata , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  Hui Cheng,et al.  Multiscale Bayesian segmentation using a trainable context model , 2001, IEEE Trans. Image Process..

[70]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[71]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[72]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[74]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[75]  William A. Barrett,et al.  Houghing the Hough: peak collection for detection of corners, junctions and line intersections , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[76]  Anil K. Jain,et al.  Bayesian learning of sparse classifiers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[77]  Thomas Serre,et al.  Categorization by Learning and Combining Object Parts , 2001, NIPS.

[78]  Mário A. T. Figueiredo Adaptive Sparseness Using Jeffreys Prior , 2001, NIPS.

[79]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[80]  Yee Whye Teh,et al.  Belief Optimization for Binary Networks: A Stable Alternative to Loopy Belief Propagation , 2001, UAI.

[81]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[82]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[83]  Bernhard Schölkopf,et al.  Kernel Methods for Extracting Local Image Semantics , 2001 .

[84]  Martin J. Wainwright,et al.  Tree-based reparameterization for approximate inference on loopy graphs , 2001, NIPS.

[85]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[86]  Antonio Torralba,et al.  Statistical Context Priming for Object Detection , 2001, ICCV.

[87]  C. Fox,et al.  Exact MAP states and expectations from perfect sampling: Greig, porteous and seheult revisited , 2001 .

[88]  Christopher K. I. Williams,et al.  Combining Belief Networks and Neural Networks for Scene Segmentation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[89]  Martial Hebert,et al.  Probabilistic Classification of Image Regions using an Observation-Constrained Generative Approach , 2002 .

[90]  Emanuele Trucco,et al.  Detecting man-made objects in unconstrained subsea videos , 2002, BMVC.

[91]  Michael Brady,et al.  Segmentation of ultrasound B-mode images with intensity inhomogeneity correction , 2002, IEEE Transactions on Medical Imaging.

[92]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[93]  Thomas Hofmann,et al.  Discriminative Learning for Label Sequences via Boosting , 2002, NIPS.

[94]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[95]  Alan L. Yuille,et al.  CCCP Algorithms to Minimize the Bethe and Kikuchi Free Energies: Convergent Alternatives to Belief Propagation , 2002, Neural Computation.

[96]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[97]  James M. Coughlan,et al.  Finding Deformable Shapes Using Loopy Belief Propagation , 2002, ECCV.

[98]  Christopher K. I. Williams,et al.  An analysis of contrastive divergence learning in gaussian boltzmann machines , 2002 .

[99]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[100]  Jiebo Luo,et al.  Probabilistic spatial context models for scene content understanding , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[101]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[102]  Adam Berger,et al.  The Improved Iterative Scaling Algorithm A Gentle Introduction , 2003 .

[103]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[104]  Pietro Perona,et al.  Mutual Boosting for Contextual Inference , 2003, NIPS.

[105]  Giuseppe Scarpa,et al.  A tree-structured Markov random field model for Bayesian image segmentation , 2003, IEEE Trans. Image Process..

[106]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[107]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[108]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[109]  Martial Hebert,et al.  Man-made structure detection in natural images using a causal multiscale random field , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[110]  Martial Hebert,et al.  Minimum risk distance measure for object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[111]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[112]  Chang-Tsun Li,et al.  A Class of Discrete Multiresolution Random Fields and Its Application to Image Segmentation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[113]  Andrew McCallum,et al.  Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences , 2003 .

[114]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[115]  Martial Hebert,et al.  An observation-constrained generative approach for probabilistic classification of image regions , 2003, Image Vis. Comput..

[116]  Martial Hebert,et al.  Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[117]  Thomas P. Minka,et al.  Algorithms for maximum-likelihood logistic regression , 2003 .

[118]  R. Zemel,et al.  Multiscale conditional random fields for image labeling , 2004, CVPR 2004.

[119]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[120]  Hannes Kruppa Object detection using scale specific Boosted parts and a Bayesian combiner , 2004 .

[121]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[122]  Yuan Qi,et al.  Contextual recognition of hand-drawn diagrams with conditional random fields , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[123]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[124]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[125]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[126]  A. McCallum,et al.  Sign detection in natural images with conditional random fields , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[127]  Thomas G. Dietterich,et al.  Training conditional random fields via gradient tree boosting , 2004, ICML.

[128]  Song-Chun Zhu,et al.  Modeling Visual Patterns by Integrating Descriptive and Generative Methods , 2004, International Journal of Computer Vision.

[129]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[130]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[131]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[132]  Song-Chun Zhu,et al.  Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling , 1998, International Journal of Computer Vision.

[133]  Sanjiv Kumar Multiclass Discriminative Fields for Parts-Based Object Detection , 2004 .

[134]  Thomas Hofmann,et al.  Gaussian process classification for segmenting and annotating sequences , 2004, ICML.

[135]  Antonio Torralba,et al.  Contextual Models for Object Detection Using Boosted Random Fields , 2004, NIPS.

[136]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[137]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[138]  Patrick Pérez,et al.  Interactive Image Segmentation Using an Adaptive GMMRF Model , 2004, ECCV.

[139]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[140]  Andrew Blake,et al.  Digital tapestry [automatic image synthesis] , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[141]  Martin J. Wainwright,et al.  On the Optimality of Tree-reweighted Max-product Message-passing , 2005, UAI.

[142]  Geoffrey E. Hinton,et al.  Learning Causally Linked Markov Random Fields , 2005, AISTATS.

[143]  Yuan Qi,et al.  Bayesian Conditional Random Fields , 2005, AISTATS.

[144]  Yuan Qi,et al.  Diagram structure recognition by Bayesian conditional random fields , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[145]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[146]  Martial Hebert,et al.  Exploiting Inference for Approximate Parameter Learning in Discriminative Fields: An Empirical Study , 2005, EMMCVPR.

[147]  Dave Optical Illusions And Visual Phenomena , 2005 .

[148]  Yang Wang,et al.  A dynamic conditional random field model for object segmentation in image sequences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[149]  Yann LeCun,et al.  Loss Functions for Discriminative Training of Energy-Based Models , 2005, AISTATS.