HADES: Hierarchical Approximate Decoding for Structured Prediction

Many fundamental tasks in machine learning require predicting complex objects rather than a simple yes-no answer or a number. Structured Output Prediction deals with learning such complex objects, which model an inherent structure between interdependent variables. Recently, large margin methods like Structured SVMs (SSVM) have gained popularity to solve this task due to efficient and generalized optimization techniques. These optimization algorithms typically rely on solving an inference or decoding sub-problem every iteration, which is computationally expensive. Moreover, little is known on learning a structured model by solving this sub-problem approximately. To address these issues, this thesis introduces a generalized technique to learn from a series of coarse-to-fine approximate candidates based on the recent Block-Coordinate Frank-Wolfe algorithm for Structured SVMs. The core observation we use is that one can learn to a reasonable degree, even from approximate solutions. This technique is presented in the context of a popular Computer Vision problem – Semantic Image Segmentation. We pose the decoding sub-problem to that of solving a series of increasingly complex surrogate Conditional Random Fields (CRF) in search for a candidate which meets the required approximation quality. We evaluate our technique in the context of natural scene image segmentation on the MSRC-21 dataset. Our experiments indicate that even extremely approximate solutions, which are 50x faster to decode, contribute to learning using our strategy. We achieve the same accuracy as our baseline and in addition, we reach a reasonable accuracy 1.5x-4x as quickly.

[1]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[2]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[5]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[6]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[9]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[10]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[11]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[12]  Long Zhu,et al.  Recursive Segmentation and Recognition Templates for 2D Parsing , 2008, NIPS.

[13]  Peter L. Bartlett,et al.  Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..

[14]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[16]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Andrew McCallum,et al.  FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs , 2009, NIPS.

[18]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[19]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Jitendra Malik,et al.  Context by region ancestry , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Jiayan Jiang,et al.  Efficient scale space auto-context for image segmentation and labeling , 2009, CVPR.

[22]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[23]  Sebastian Nowozin,et al.  On Parameter Learning in CRF-Based Approaches to Object Class Image Segmentation , 2010, ECCV.

[24]  Ben Taskar,et al.  Sidestepping Intractable Inference with Structured Ensemble Cascades , 2010, NIPS.

[25]  Martial Hebert,et al.  Stacked Hierarchical Labeling , 2010, ECCV.

[26]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[27]  Pascal Fua,et al.  Are spatial and global constraints really necessary for segmentation? , 2011, 2011 International Conference on Computer Vision.

[28]  Alessandro Moschitti,et al.  Fast Support Vector Machines for Structural Kernels , 2011, ECML/PKDD.

[29]  Sebastian Nowozin,et al.  Variable grouping for energy minimization , 2011, CVPR 2011.

[30]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[31]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[32]  Tianli Yu,et al.  Kernelized structural SVM learning for supervised object segmentation , 2011, CVPR 2011.

[33]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[34]  Jianxiong Xiao,et al.  Memorability of Image Regions , 2012, NIPS.

[35]  Tong Zhang,et al.  Proximal Stochastic Dual Coordinate Ascent , 2012, ArXiv.

[36]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[38]  Pascal Fua,et al.  Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Thomas Hofmann,et al.  Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[40]  R. Fergus,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[41]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[42]  Andreas C. Müller,et al.  Methods for learning structured prediction in semantic segmentation of natural images , 2014 .

[43]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[44]  Michael I. Jordan,et al.  Adding vs. Averaging in Distributed Primal-Dual Optimization , 2015, ICML.