Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond

We introduce regularized Frank-Wolfe, a general and effective algorithm for inference and learning of dense conditional random fields (CRFs). The algorithm optimizes a nonconvex continuous relaxation of the CRF inference problem using vanilla Frank-Wolfe with approximate updates, which are equivalent to minimizing a regularized energy function. Our proposed method is a generalization of existing algorithms such as mean field or concave-convex procedure. This perspective not only offers a unified analysis of these algorithms, but also allows an easy way of exploring different variants that potentially yield better performance. We illustrate this in our empirical results on standard semantic segmentation datasets, where several instantiations of our regularized Frank-Wolfe outperform mean field inference, both as a standalone component and as an end-to-end trainable layer in a neural network. We also show that dense CRFs, coupled with our new algorithms, produce significant improvements over strong CNN baselines.

[1]  Paul Grigas,et al.  New analysis and results for the Frank–Wolfe method , 2013, Mathematical Programming.

[2]  M. Fukushima,et al.  A minimization method for the sum of a convex function and a continuously differentiable function , 1981 .

[3]  Xu Hu,et al.  SDCA-Powered Inexact Dual Augmented Lagrangian Method for Fast CRF Learning , 2018, AISTATS.

[4]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[5]  Vlad Niculae,et al.  A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.

[6]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[7]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[8]  Nicholas Ruozzi,et al.  Bethe Learning of Graphical Models via MAP Decoding , 2016, AISTATS.

[9]  Ofer Meshi,et al.  Smooth and Strong: MAP Inference with Linear Convergence , 2015, NIPS.

[10]  Raquel Urtasun,et al.  Fully Connected Deep Structured Networks , 2015, ArXiv.

[11]  Philip H. S. Torr,et al.  Higher Order Conditional Random Fields in Deep Neural Networks , 2015, ECCV.

[12]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Philip H. S. Torr,et al.  A Projected Gradient Descent Method for CRF Inference Allowing End-to-End Training of Arbitrary Pairwise Potentials , 2017, EMMCVPR.

[14]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[15]  Mário A. T. Figueiredo,et al.  Conditional Random Fields as Recurrent Neural Networks for 3D Medical Imaging Segmentation , 2018, ArXiv.

[16]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[17]  Nikos Paragios,et al.  Alternating Direction Graph Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[19]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[20]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[21]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[22]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[23]  Karan Sapra,et al.  Hierarchical Multi-Scale Attention for Semantic Segmentation , 2020, ArXiv.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Xiaoxiao Li,et al.  Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[27]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[29]  R. Horgan,et al.  Statistical Field Theory , 2014 .

[30]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[31]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Shuicheng Yan,et al.  Semantic Segmentation via Structured Patch Prediction, Context CRF and Guidance CRF , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  HarchaouiZaid,et al.  Conditional gradient algorithms for norm-regularized smooth convex optimization , 2015 .

[34]  Pascal Fua,et al.  Principled Parallel Mean-Field Inference for Discrete Random Fields , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[36]  Nikos Paragios,et al.  Continuous Relaxation of MAP Inference: A Nonconvex Perspective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Iasonas Kokkinos,et al.  Fast, Exact and Multi-scale Inference for Semantic Image Segmentation with Deep Gaussian CRFs , 2016, ECCV.

[38]  Zaïd Harchaoui,et al.  On learning to localize objects with minimal supervision , 2014, ICML.

[39]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[41]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Andreas Krause,et al.  Optimal Continuous DR-Submodular Maximization and Applications to Provable Mean Field Inference , 2019, ICML.

[43]  Pushmeet Kohli,et al.  Efficient Continuous Relaxations for Dense CRF , 2016, ECCV.

[44]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[45]  Martin Jaggi,et al.  A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe , 2017, AISTATS.

[46]  Julien Mairal,et al.  Optimization with First-Order Surrogate Functions , 2013, ICML.

[47]  Ronald,et al.  Learning representations by backpropagating errors , 2004 .

[48]  Dirk A. Lorenz,et al.  A generalized conditional gradient method and its connection to an iterative shrinkage method , 2009, Comput. Optim. Appl..

[49]  Shiliang Pu,et al.  Mixed context networks for semantic segmentation , 2016, ArXiv.

[50]  Damien Scieur,et al.  Affine Invariant Analysis of Frank-Wolfe on Strongly Convex Sets , 2020, ICML.

[51]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[52]  Francis R. Bach,et al.  Duality Between Subgradient and Conditional Gradient Methods , 2012, SIAM J. Optim..

[53]  Stephen Gould,et al.  Accelerated dual decomposition for MAP inference , 2010, ICML.

[54]  Vladlen Koltun,et al.  Parameter Learning and Convergent Inference for Dense Random Fields , 2013, ICML.

[55]  Pradeep Ravikumar,et al.  Quadratic programming relaxations for metric labeling and Markov random field MAP estimation , 2006, ICML.

[56]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[58]  Solomon Eyal Shimony,et al.  Finding MAPs for Belief Networks is NP-Hard , 1994, Artif. Intell..

[59]  Martin Jaggi,et al.  Linearly Convergent Frank-Wolfe without Line-Search , 2020, AISTATS.

[60]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[61]  Tommi S. Jaakkola,et al.  New Outer Bounds on the Marginal Polytope , 2007, NIPS.

[62]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[63]  Philip H. S. Torr,et al.  Efficient Linear Programming for Dense CRFs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Georg Martius,et al.  Differentiation of Blackbox Combinatorial Solvers , 2020, ICLR.

[65]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[66]  Laurent Condat Fast projection onto the simplex and the l1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pmb {l}_\mathbf {1}$$\end{ , 2015, Mathematical Programming.

[67]  Vladlen Koltun,et al.  The Limited Multi-Label Projection Layer , 2019, ArXiv.

[68]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[69]  Guanghui Lan The Complexity of Large-scale Convex Programming under a Linear Optimization Oracle , 2013, 1309.5550.

[70]  Yaoliang Yu,et al.  Generalized Conditional Gradient for Sparse Estimation , 2014, J. Mach. Learn. Res..

[71]  Philip H. S. Torr,et al.  Proximal Mean-Field for Neural Network Quantization , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[72]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[73]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.