Efficient Inference and Learning for Computer Vision Labelling Problems

Discrete energy minimization has recently emerged as an indispensable tool for computer vision problems. It enables inference of the maximum a posteriori solutions of Markov and conditional random fields, which can be used to model labelling problems in vision. When formulating such problems in an energy minimization framework, there are three main issues that need to be addressed: (i) How to perform efficient inference to compute the optimal solution; (ii) How to incorporate prior knowledge into the model; and (iii) How to learn the parameter values. This thesis focusses on these aspects and presents novel solutions to address them. As computer vision moves towards the era of large videos and gigapixel images, computational efficiency is becoming increasingly important. We present two novel methods to improve the efficiency of energy minimization algorithms. The first method works by “recycling” results from previous problem instances. The second simplifies the energy minimization problem by “reducing” the number of variables in the energy function. We demonstrate a substantial improvement in the running time of various labelling problems such as, interactive image and video segmentation, object recognition, stereo matching. In the second part of the thesis we explore the use of natural image statistics for the single view reconstruction problem, where the task is to recover a theatre-stage representation (containing planar surfaces and their geometrical relationships to each other) from a single 2D image. To this end, we introduce a class of multi-label higher order functions to model these statistics based on the distribution of geometrical features of planar surfaces. We also show that this new class of functions can be solved exactly with efficient graph cut methods. The third part of the thesis addresses the problem of learning the parameters of the energy function. Although several methods have been proposed to learn the model parameters from training data, they suffer from various drawbacks, such as limited applicability or noisy estimates due to poor approximations. We present an accurate and efficient learning method, and demonstrate that it is widely applicable.

[1]  Carlo Tomasi,et al.  A Pixel Dissimilarity Measure That Is Insensitive to Image Sampling , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[3]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[4]  Andrew Zisserman,et al.  Efficient discriminative learning of parts-based models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Nikos Komodakis,et al.  Fast, Approximately Optimal Solutions for Single and Dynamic MRFs , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Daniel Freedman,et al.  Energy minimization via graph cuts: settling what is possible , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[8]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[9]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[10]  Patrick Pérez,et al.  Interactive Image Segmentation Using an Adaptive GMMRF Model , 2004, ECCV.

[11]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  P. L. Ivanescu Some Network Flow Problems Solved with Pseudo-Boolean Programming , 1965 .

[13]  Andrew Blake,et al.  Multiscale Categorical Object Recognition Using Contour Fragments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ashish Raj,et al.  A graph cut algorithm for generalized image deconvolution , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Alexander Schrijver,et al.  A Combinatorial Algorithm Minimizing Submodular Functions in Strongly Polynomial Time , 2000, J. Comb. Theory B.

[16]  Antonio Torralba,et al.  Statistics of natural image categories , 2003, Network.

[17]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Pushmeet Kohli,et al.  Exact inference in multi-label CRFs with higher order cliques , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, SIGGRAPH 2005.

[20]  C. V. Jawahar,et al.  Recognizing Human Activities from Constituent Actions , 2005 .

[21]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[22]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.

[23]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Pushmeet Kohli,et al.  P³ & Beyond: Move Making Algorithms for Solving Higher Order Functions , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Olivier Juan,et al.  Active Graph Cuts , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Dmitrij Schlesinger,et al.  Exact Solution of Permuted Submodular MinSum Problems , 2007, EMMCVPR.

[27]  Patrick Gallinari,et al.  SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[28]  Antonio Criminisi,et al.  Single-Histogram Class Models for Image Segmentation , 2006, ICVGIP.

[29]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[30]  Philip H. S. Torr,et al.  What , Where & How Many ? Combining Object Detectors and CRFs , 2010 .

[31]  C. V. Jawahar,et al.  Discriminant substrokes for online handwriting recognition , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[32]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[33]  Michael J. Black,et al.  Efficient Belief Propagation with Learned Higher-Order Markov Random Fields , 2006, ECCV.

[34]  Nikos Komodakis,et al.  MRF Optimization via Dual Decomposition: Message-Passing Revisited , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[35]  Mark W. Schmidt,et al.  Structure learning in random fields for heart motion abnormality detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Mark W. Schmidt,et al.  Generalized Fast Approximate Energy Minimization via Graph Cuts: Alpha-Expansion Beta-Shrink Moves , 2011, ArXiv.

[37]  Pawan Kumar Mudigonda,et al.  Combinatorial and convex optimization for probabilistic models in computer vision , 2008 .

[38]  Pushmeet Kohli,et al.  Dynamic Graph Cuts for Efficient Inference in Markov Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Stan Z. Li,et al.  Markov Random Field Modeling in Computer Vision , 1995, Computer Science Workbench.

[40]  W. Freeman,et al.  Bethe free energy, Kikuchi approximations, and belief propagation algorithms , 2001 .

[41]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[43]  Li Zhang,et al.  Estimating Optimal Parameters for MRF Stereo from a Single Image Pair , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Satoru Iwata,et al.  A combinatorial strongly polynomial algorithm for minimizing submodular functions , 2001, JACM.

[45]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[46]  C. V. Jawahar,et al.  GEOMETRIC AND STOCHASTIC ERROR MINIMISATION IN MOTION TRACKING , 2004 .

[47]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[48]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[49]  Andrew Zisserman,et al.  A Statistical Approach to Material Classification Using Image Patch Exemplars , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Giorgio Gallo,et al.  On the supermodular knapsack problem , 1989, Math. Program..

[52]  Tomás Werner,et al.  A Linear Programming Approach to Max-Sum Problem: A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[54]  Michael J. Black,et al.  On the Spatial Statistics of Optical Flow , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[55]  Vladimir Kolmogorov,et al.  Graph cut based image segmentation with connectivity priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Endre Boros,et al.  Pseudo-Boolean optimization , 2002, Discret. Appl. Math..

[57]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[58]  Daniel P. Huttenlocher,et al.  Learning for stereo vision using the structured support vector machine , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Christopher Joseph Pal,et al.  Learning Conditional Random Fields for Stereo , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  D. Schlesinger,et al.  TRANSFORMING AN ARBITRARY MINSUM PROBLEM INTO A BINARY ONE , 2006 .

[62]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  Martial Hebert,et al.  Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[64]  Alain Billionnet,et al.  Maximizing a supermodular pseudoboolean function: A polynomial algorithm for supermodular cubic functions , 1985, Discret. Appl. Math..

[65]  Václav Hlavác,et al.  Ten Lectures on Statistical and Structural Pattern Recognition , 2002, Computational Imaging and Vision.

[66]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[67]  James B. Orlin,et al.  A faster strongly polynomial time algorithm for submodular function minimization , 2007, Math. Program..

[68]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[69]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[70]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[71]  Richard Szeliski,et al.  Stereo Matching with Nonlinear Diffusion , 1998, International Journal of Computer Vision.

[72]  C. V. Jawahar,et al.  Discriminative Actions for Recognising Events , 2006, ICVGIP.

[73]  Victor S. Lempitsky,et al.  Global Optimization for Shape Fitting , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Pushmeet Kohli,et al.  Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts , 2008, International Journal of Computer Vision.

[75]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[76]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[78]  C. V. Jawahar,et al.  Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[79]  Carlo Tomasi,et al.  Correspondence as energy-based segmentation , 2007, Image Vis. Comput..

[80]  Ivan Laptev,et al.  Track to the future: Spatio-temporal video segmentation with long-range motion cues , 2011, CVPR 2011.

[81]  Vladimir Kolmogorov,et al.  An Analysis of Convex Relaxations for MAP Estimation of Discrete MRFs , 2009, J. Mach. Learn. Res..

[82]  Pushmeet Kohli,et al.  Reduce, reuse & recycle: Efficiently solving multi-label MRFs , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[83]  Nikos Komodakis,et al.  A new framework for approximate labeling via graph cuts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[84]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[85]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[86]  C. V. Jawahar,et al.  Learning Mixtures of Offline and Online features for Handwritten Stroke Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[87]  Daphne Koller,et al.  MAP Estimation of Semi-Metric MRFs via Hierarchical Graph Cuts , 2009, UAI.

[88]  Olga Veksler Multi-label Moves for MRFs with Truncated Convex Priors , 2009, EMMCVPR.

[89]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[90]  William T. Freeman,et al.  On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[91]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[92]  Andrew Blake,et al.  Image Segmentation by Branch-and-Mincut , 2008, ECCV.

[93]  Andrew Blake,et al.  Digital tapestry [automatic image synthesis] , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[94]  Vladimir Kolmogorov,et al.  Optimizing Binary MRFs via Extended Roof Duality , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[95]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[96]  Pushmeet Kohli,et al.  Dynamic Hybrid Algorithms for MAP Inference in Discrete MRFs , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[97]  C. V. Jawahar,et al.  An MRF Model for Binarization of Natural Scene Text , 2011, 2011 International Conference on Document Analysis and Recognition.

[98]  Olga Veksler Graph Cut Based Optimization for MRFs with Truncated Convex Priors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[99]  Vladimir Kolmogorov,et al.  Multi-camera Scene Reconstruction via Graph Cuts , 2002, ECCV.

[100]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[101]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[102]  Marshall F. Tappen,et al.  Utilizing Variational Optimization to Learn Markov Random Fields , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[103]  CipollaRoberto,et al.  Multiscale Categorical Object Recognition Using Contour Fragments , 2008 .

[104]  Hiroshi Ishikawa,et al.  Exact Optimization for Markov Random Fields with Convex Priors , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[105]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[106]  Gareth Funka-Lea,et al.  Graph Cuts and Efficient N-D Image Segmentation , 2006, International Journal of Computer Vision.

[107]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[108]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[109]  Philip H. S. Torr,et al.  Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[110]  Ben Taskar,et al.  Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[111]  Philip H. S. Torr,et al.  Efficient piecewise learning for conditional random fields , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[112]  C. V. Jawahar,et al.  Dynamic Events as Mixtures of Spatial and Temporal Features , 2006, ICVGIP.

[113]  Ivan Kovtun,et al.  Partial Optimal Labeling Search for a NP-Hard Subclass of (max, +) Problems , 2003, DAGM-Symposium.

[114]  Shmuel Peleg,et al.  Seamless Image Stitching in the Gradient Domain , 2004, ECCV.

[115]  D. R. Fulkerson,et al.  Flows in Networks. , 1964 .

[116]  Vladimir Kolmogorov,et al.  On partial optimality in multi-label MRFs , 2008, ICML '08.

[117]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[118]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[119]  Dale Purves,et al.  Image/source statistics of surfaces in natural scenes , 2003, Network.

[120]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[121]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[122]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[123]  Michael J. Black,et al.  Fields of Experts: a framework for learning image priors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[124]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[125]  Yuichi Ohta,et al.  Occlusion detectable stereo-occlusion patterns in camera matrix , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[126]  Rupert Paget,et al.  Texture synthesis via a noncausal nonparametric multiscale Markov random field , 1998, IEEE Trans. Image Process..

[127]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[128]  Vladimir Kolmogorov,et al.  Minimizing Nonsubmodular Functions with Graph Cuts-A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[129]  Philip H. S. Torr,et al.  Improved Moves for Truncated Convex Models , 2008, J. Mach. Learn. Res..

[130]  Pushmeet Kohli,et al.  Minimizing dynamic and higher order energy functions using graph cuts , 2010 .