论文信息 - Beyond Gradient Descent for Regularized Segmentation Losses

Beyond Gradient Descent for Regularized Segmentation Losses

The simplicity of gradient descent (GD) made it the default method for training ever-deeper and complex neural networks. Both loss functions and architectures are often explicitly tuned to be amenable to this basic local optimization. In the context of weakly-supervised CNN segmentation, we demonstrate a well-motivated loss function where an alternative optimizer (ADM) achieves the state-of-the-art while GD performs poorly. Interestingly, GD obtains its best result for a "smoother" tuning of the loss function. The results are consistent across different network architectures. Our loss is motivated by well-understood MRF/CRF regularization models in "shallow" segmentation and their known global solvers. Our work suggests that network design/training should pay more attention to optimization methods.

[1] Stochastic Relaxation , 2014, Computer Vision, A Reference Guide.

[2] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[4] Vladimir Kolmogorov,et al. Computing geodesics and minimal surfaces via graph cuts , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Sebastian Nowozin,et al. A Comparative Study of Modern Inference Techniques for Structured Discrete Energy Minimization Problems , 2014, International Journal of Computer Vision.

[7] W. Clem Karl,et al. Variable splitting techniques for discrete tomography , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[8] Patrick Pérez,et al. Distributed Non-Convex ADMM-inference in Large-scale Random Fields , 2014 .

[9] Hossein Mobahi,et al. Deep Learning via Semi-supervised Embedding , 2012, Neural Networks: Tricks of the Trade.

[10] Olga Veksler,et al. Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11] Guillermo Sapiro,et al. Geodesic Active Contours , 1995, International Journal of Computer Vision.

[12] Jian Sun,et al. ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Yuri Boykov,et al. Normalized Cut Loss for Weakly-Supervised CNN Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14] Antonin Chambolle,et al. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[15] Daniel Cremers,et al. A convex relaxation approach for computing minimal partitions , 2009, CVPR.

[16] Alan L. Yuille. Belief Propagation , Mean-field , and Bethe approximations , 2010 .

[17] Ismail Ben Ayed,et al. On Regularized Losses for Weakly-supervised CNN Segmentation , 2018, ECCV.

[18] Lena Gorelick,et al. Efficient Squared Curvature , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20] Vladimir Kolmogorov,et al. Optimizing Binary MRFs via Extended Roof Duality , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Jose Dolz,et al. Unbiased Shape Compactness for Segmentation , 2017, MICCAI.

[22] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[23] Jose Dolz,et al. DOPE: Distributed Optimization for Pairwise Energies , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Richard Szeliski,et al. A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Zheng Xu,et al. Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.

[26] Marie-Pierre Jolly,et al. Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[27] Christoph H. Lampert,et al. Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[28] Daniel Cremers,et al. A Convex Approach to Minimal Partitions , 2012, SIAM J. Imaging Sci..

[29] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[30] Xue-Cheng Tai,et al. A study on continuous max-flow and min-cut approaches , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32] Judea Pearl,et al. Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[33] Vladimir Kolmogorov,et al. Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Pushmeet Kohli,et al. Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Arindam Banerjee,et al. Bregman Alternating Direction Method of Multipliers , 2013, NIPS.

[36] D. Mumford,et al. Optimal approximations by piecewise smooth functions and associated variational problems , 1989 .

[37] Raquel Urtasun,et al. Fully Connected Deep Structured Networks , 2015, ArXiv.

[38] Demetri Terzopoulos,et al. Snakes: Active contour models , 2004, International Journal of Computer Vision.

[39] Camille Couprie,et al. Power Watershed: A Unifying Graph-Based Optimization Framework , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40] Daniel Cremers,et al. Discrete-Continuous ADMM for Transductive Inference in Higher-Order MRFs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41] Ismail Ben Ayed,et al. Kernel Cuts: Kernel and Spectral Clustering Meet Regularization , 2018, International Journal of Computer Vision.

[42] Olga Veksler,et al. Efficient Graph Cut Optimization for Full CRFs with Quantized Edges , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43] Andrew Blake,et al. Visual Reconstruction , 1987, Deep Learning for EEG-Based Brain–Computer Interfaces.