Beyond Gradient Descent for Regularized Segmentation Losses

The simplicity of gradient descent (GD) made it the default method for training ever-deeper and complex neural networks. Both loss functions and architectures are often explicitly tuned to be amenable to this basic local optimization. In the context of weakly-supervised CNN segmentation, we demonstrate a well-motivated loss function where an alternative optimizer (ADM) achieves the state-of-the-art while GD performs poorly. Interestingly, GD obtains its best result for a "smoother" tuning of the loss function. The results are consistent across different network architectures. Our loss is motivated by well-understood MRF/CRF regularization models in "shallow" segmentation and their known global solvers. Our work suggests that network design/training should pay more attention to optimization methods.

[1]  Stochastic Relaxation , 2014, Computer Vision, A Reference Guide.

[2]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[4]  Vladimir Kolmogorov,et al.  Computing geodesics and minimal surfaces via graph cuts , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Sebastian Nowozin,et al.  A Comparative Study of Modern Inference Techniques for Structured Discrete Energy Minimization Problems , 2014, International Journal of Computer Vision.

[7]  W. Clem Karl,et al.  Variable splitting techniques for discrete tomography , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[8]  Patrick Pérez,et al.  Distributed Non-Convex ADMM-inference in Large-scale Random Fields , 2014 .

[9]  Hossein Mobahi,et al.  Deep Learning via Semi-supervised Embedding , 2012, Neural Networks: Tricks of the Trade.

[10]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  Guillermo Sapiro,et al.  Geodesic Active Contours , 1995, International Journal of Computer Vision.

[12]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yuri Boykov,et al.  Normalized Cut Loss for Weakly-Supervised CNN Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[15]  Daniel Cremers,et al.  A convex relaxation approach for computing minimal partitions , 2009, CVPR.

[16]  Alan L. Yuille Belief Propagation , Mean-field , and Bethe approximations , 2010 .

[17]  Ismail Ben Ayed,et al.  On Regularized Losses for Weakly-supervised CNN Segmentation , 2018, ECCV.

[18]  Lena Gorelick,et al.  Efficient Squared Curvature , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Vladimir Kolmogorov,et al.  Optimizing Binary MRFs via Extended Roof Duality , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jose Dolz,et al.  Unbiased Shape Compactness for Segmentation , 2017, MICCAI.

[22]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[23]  Jose Dolz,et al.  DOPE: Distributed Optimization for Pairwise Energies , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Zheng Xu,et al.  Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.

[26]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[27]  Christoph H. Lampert,et al.  Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[28]  Daniel Cremers,et al.  A Convex Approach to Minimal Partitions , 2012, SIAM J. Imaging Sci..

[29]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[30]  Xue-Cheng Tai,et al.  A study on continuous max-flow and min-cut approaches , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[33]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Arindam Banerjee,et al.  Bregman Alternating Direction Method of Multipliers , 2013, NIPS.

[36]  D. Mumford,et al.  Optimal approximations by piecewise smooth functions and associated variational problems , 1989 .

[37]  Raquel Urtasun,et al.  Fully Connected Deep Structured Networks , 2015, ArXiv.

[38]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[39]  Camille Couprie,et al.  Power Watershed: A Unifying Graph-Based Optimization Framework , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Daniel Cremers,et al.  Discrete-Continuous ADMM for Transductive Inference in Higher-Order MRFs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Ismail Ben Ayed,et al.  Kernel Cuts: Kernel and Spectral Clustering Meet Regularization , 2018, International Journal of Computer Vision.

[42]  Olga Veksler,et al.  Efficient Graph Cut Optimization for Full CRFs with Quantized Edges , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Andrew Blake,et al.  Visual Reconstruction , 1987, Deep Learning for EEG-Based Brain–Computer Interfaces.