Global structured models towards scene understanding

Many scene understanding tasks are formulated as a labelling problem that tries to assign a label to each pixel of an image. These discrete labels may vary depending on the task, for example they may correspond to di erent object classes such as car, grass or sky, or to depths or to intensity after denoising. These labelling problems are typically formulated as a pairwise Markov or Conditional Random Field, modelling the dependencies of labels of pairs of variables in the local neighbourhoods. However, these pairwise models are very restricted in their expressivity. They can not model rich natural statistics and induce desired complex structures in the output labelling. In this thesis we propose global structured formulations beyond pairwise models, showing that they are very useful in computer vision, furthermore that they can still be learnt and optimised e ciently. First we propose a model, which generalises existing approaches for semantic object class segmentation, formulated in terms of pixels, segments or groups of segments. The proposed method e ciently integrates the strengths of these di erent approaches, capturing discriminative information across di erent scales. Next we show how the standard approaches for the semantic object class segmentation problem can be improved by the inclusion of costs based on high level statistics, including object class co-occurrence, which capture knowledge of scene semantics, for example that motorbikes and cows are unlikely to occur together in an image. Then we propose a novel latent random eld support vector machine for object detection with a convex mrf regularization and suggest a way to include this information in the object class segmentation formulation. Finally we propose a model that jointly estimates labellings of multiple domains over a product space of labels. We demonstrate the usefulness of this model on the problem of joint object class semantic segmentation and dense 3D stereo reconstruction and show that this approach signi cantly outperforms existing methods. We show that all proposed models can be optimised e ciently using powerful graph cut based move making algorithms. List of Publications Journals ‰ubor Ladický, Chris Russell, Pushmeet Kohli, Philip H.S. Torr Inference Methods for CRFs with Co-occurrence Statistics International Journal of Computer Vision, 2011 Invited Paper ‰ubor Ladický, Paul Sturgess, Chris Russell, Sunando Sengupta, Yalin Bastanlar, William Clocksin, Philip H.S. Torr Joint Optimisation for Object Class Segmentation and Dense Stereo Reconstruction International Journal of Computer Vision, 2011 Invited Paper Pushmeet Kohli, ‰ubor Ladický, Philip H.S. Torr Robust Higher Order Potentials for Enforcing Label Consistency International Journal of Computer Vision, 2009 Conferences ‰ubor Ladický, Philip H.S. Torr Locally Linear Support Vector Machines International Conference on Machine Learning, 2011 ‰ubor Ladický, Chris Russell, Pushmeet Kohli, Philip H.S. Torr Graph Cut based Inference with Co-occurrence Statistics European Conference on Computer Vision, 2010 Best Paper Award ‰ubor Ladický, Paul Sturgess, Karteek Alahari, Chris Russell, Philip H.S. Torr What,Where & How Many? Combining Object Detectors and CRFs European Conference on Computer Vision, 2010 ‰ubor Ladický, Paul Sturgess, Chris Russell, Sunando Sengupta, Yalin Bastanlar, William Clocksin, Philip H.S. Torr Joint Optimisation for Object Class Segmentation and Dense Stereo Reconstruction British Machine Vision Conference, 2010 Best Paper Award Chris Russell, ‰ubor Ladický, Pushmeet Kohli, Philip H.S. Torr Exact and Approximate Inference in Associative Hierarchical Networks using GraphCuts Conference on Uncertainty in Arti cial Intelligence, 2010 ‰ubor Ladický, Chris Russell, Pushmeet Kohli, Philip H.S. Torr Associative Hierarchical CRFs for Object Class Image Segmentation International Conference on Computer Vision, 2009 Paul Sturgess, Karteek Alahari, ‰ubor Ladický, Philip H.S. Torr Combining Appearance and Structure from Motion Features for Road Scene Understanding British Machine Vision Conference, 2009 Pushmeet Kohli, ‰ubor Ladický, Philip H.S. Torr Robust Higher Order Potentials for Enforcing Label Consistency Conference on Computer Vision and Pattern Recognition, 2008

[1]  Gabriela Csurka,et al.  A Simple High Performance Approach to Semantic Segmentation , 2008, BMVC.

[2]  Daniel Freedman,et al.  Energy minimization via graph cuts: settling what is possible , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Pushmeet Kohli,et al.  Exact inference in multi-label CRFs with higher order cliques , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, ACM Trans. Graph..

[5]  Frédéric Jurie,et al.  Combining appearance models and Markov Random Fields for category level object segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[7]  D. Schlesinger,et al.  TRANSFORMING AN ARBITRARY MINSUM PROBLEM INTO A BINARY ONE , 2006 .

[8]  Koen E. A. van de Sande,et al.  Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Anton Osokin,et al.  Fast Approximate Energy Minimization with Label Costs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Subhransu Maji,et al.  Max-margin additive classifiers for detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[13]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[14]  Andrew Zisserman,et al.  An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[17]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[18]  Isabelle Guyon,et al.  Automatic Capacity Tuning of Very Large VC-Dimension Classifiers , 1992, NIPS.

[19]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[20]  Andrew Blake,et al.  Visual Reconstruction , 1987, Deep Learning for EEG-Based Brain–Computer Interfaces.

[21]  Philip H. S. Torr,et al.  Efficiently solving convex relaxations for MAP estimation , 2008, ICML '08.

[22]  Stephen Gould,et al.  Region-based Segmentation and Object Detection , 2009, NIPS.

[23]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[25]  Osamu Hasegawa,et al.  Random Field Model for Integration of Local Information and Global Information , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Patrick Pérez,et al.  Interactive Image Segmentation Using an Adaptive GMMRF Model , 2004, ECCV.

[27]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[29]  Pushmeet Kohli,et al.  Graph Cuts for Minimizing Robust Higher Order Potentials , 2008 .

[30]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[31]  Eric V. Denardo,et al.  Flows in Networks , 2011 .

[32]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Ben Taskar,et al.  Learning associative Markov networks , 2004, ICML.

[36]  Philip H. S. Torr,et al.  The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix , 1997, International Journal of Computer Vision.

[37]  Hiroshi Ishikawa,et al.  Exact Optimization for Markov Random Fields with Convex Priors , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[39]  Daniel Cremers,et al.  Convex Relaxation for Multilabel Problems with Product Label Spaces , 2010, ECCV.

[40]  Adrian Barbu,et al.  Learning real-time MRF inference for image denoising , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[42]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[46]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Maneesh Agrawala,et al.  Interactive video cutout , 2005, ACM Trans. Graph..

[48]  Philip H. S. Torr,et al.  Randomized trees for human pose detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[50]  Vladimir Kolmogorov,et al.  Comparison of Energy Minimization Algorithms for Highly Connected Graphs , 2006, ECCV.

[51]  Cordelia Schmid,et al.  Classification aided two stage localization , 2008 .

[52]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[54]  Long Zhu,et al.  Recursive Segmentation and Recognition Templates for 2D Parsing , 2008, NIPS.

[55]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[56]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[57]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[58]  Andrew Zisserman,et al.  OBJ CUT , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[59]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[60]  Matti Pietikäinen,et al.  Performance evaluation of texture measures with classification based on Kullback discrimination of distributions , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[61]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[62]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[64]  Joan Lasenby,et al.  Geometric motion segmentation and model selection - Discussion , 1998 .

[65]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[66]  P. L. Ivanescu Some Network Flow Problems Solved with Pseudo-Boolean Programming , 1965 .

[67]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[68]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[69]  Pushmeet Kohli,et al.  Exact and Approximate Inference in Associative Hierarchical Networks using Graph Cuts , 2010, UAI.

[70]  Patrick Gallinari,et al.  SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[71]  Philip H. S. Torr,et al.  Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[72]  Vladimir Kolmogorov,et al.  Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[73]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[74]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[75]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[76]  Yefim Dinitz,et al.  Dinitz' Algorithm: The Original Version and Even's Version , 2006, Essays in Memory of Shimon Even.

[77]  Olga Veksler Graph Cut Based Optimization for MRFs with Truncated Convex Priors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[79]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[80]  Philip H. S. Torr,et al.  Improved Moves for Truncated Convex Models , 2008, J. Mach. Learn. Res..

[81]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[82]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  William T. Freeman,et al.  On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[85]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[86]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[87]  Cristian Sminchisescu,et al.  Object recognition as ranking holistic figure-ground hypotheses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[88]  Jeff A. Bilmes,et al.  A Submodular-supermodular Procedure with Applications to Discriminative Structure Learning , 2005, UAI.

[89]  Cordelia Schmid,et al.  Object Recognition by Integrating Multiple Image Segmentations , 2008, ECCV.

[90]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[91]  Richard S. Zemel,et al.  Learning and Incorporating Top-Down Cues in Image Segmentation , 2006, ECCV.

[92]  James B. Orlin,et al.  A faster strongly polynomial time algorithm for submodular function minimization , 2007, Math. Program..

[93]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[94]  Lin Yang,et al.  Multiple Class Segmentation Using A Unified Framework over Mean-Shift Patches , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[95]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[96]  Roberto Cipolla,et al.  Modelling and Interpretation of Architecture from Several Images , 2004, International Journal of Computer Vision.

[97]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[98]  Philip H. S. Torr,et al.  Solving Energies with Higher Order Cliques , 2007 .

[99]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[100]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..