Variational Networks: An Optimal Control Approach to Early Stopping Variational Methods for Image Restoration

We investigate a well-known phenomenon of variational approaches in image processing, where typically the best image quality is achieved when the gradient flow process is stopped before converging to a stationary point. This paradox originates from a tradeoff between optimization and modeling errors of the underlying variational model and holds true even if deep learning methods are used to learn highly expressive regularizers from data. In this paper, we take advantage of this paradox and introduce an optimal stopping time into the gradient flow process, which in turn is learned from data by means of an optimal control approach. After a time discretization, we obtain variational networks, which can be interpreted as a particular type of recurrent neural networks. The learned variational networks achieve competitive results for image denoising and image deblurring on a standard benchmark data set. One of the key theoretical results is the development of first- and second-order conditions to verify optimal stopping time. A nonlinear spectral analysis of the gradient of the learned regularizer gives enlightening insights into the different regularization properties.

[1]  P. Hartman Ordinary Differential Equations , 1965 .

[2]  E. Zeidler Nonlinear Functional Analysis and its Applications: III: Variational Methods and Optimization , 1984 .

[3]  Lutz Prechelt,et al.  Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[4]  E Weinan,et al.  A mean-field optimal control formulation of deep learning , 2018, Research in the Mathematical Sciences.

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[7]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  A. Chambolle,et al.  An introduction to Total Variation for Image Analysis , 2009 .

[9]  Horst Bischof,et al.  Fast and accurate image upscaling with super-resolution forests , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[11]  Qianxiao Li,et al.  An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks , 2018, ICML.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  J. Butcher Numerical methods for ordinary differential equations , 2003 .

[14]  D. Mumford,et al.  Optimal approximations by piecewise smooth functions and associated variational problems , 1989 .

[15]  A. Rieder Keine Probleme mit Inversen Problemen , 2003 .

[16]  Eldad Haber,et al.  Reversible Architectures for Arbitrarily Deep Residual Neural Networks , 2017, AAAI.

[17]  Thomas Pock,et al.  Inertial Proximal Alternating Linearized Minimization (iPALM) for Nonconvex and Nonsmooth Problems , 2016, SIAM J. Imaging Sci..

[18]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[19]  Yunjin Chen,et al.  Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Antonin Chambolle,et al.  An introduction to continuous optimization for imaging , 2016, Acta Numerica.

[21]  J. Hale,et al.  Ordinary Differential Equations , 2019, Fundamentals of Numerical Mathematics for Physicists and Engineers.

[22]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[23]  Begnaud Francis Hildebrand,et al.  Introduction to numerical analysis: 2nd edition , 1987 .

[24]  O. SCHERZER,et al.  On the Landweber iteration for nonlinear ill-posed problems , 1996 .

[25]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[26]  Lea Fleischer,et al.  Regularization of Inverse Problems , 1996 .

[27]  L. Landweber An iteration formula for Fredholm integral equations of the first kind , 1951 .

[28]  Per Christian Hansen,et al.  IR Tools: a MATLAB package of iterative regularization methods and large-scale test problems , 2017, Numerical Algorithms.

[29]  Thomas Pock,et al.  Variational Networks: Connecting Variational Methods and Deep Learning , 2017, GCPR.

[30]  Lorenzo Rosasco,et al.  Learning with Incremental Iterative Regularization , 2014, NIPS.

[31]  A. Tikhonov,et al.  Nonlinear Ill-Posed Problems , 1997 .

[32]  Long Chen,et al.  Maximum Principle Based Algorithms for Deep Learning , 2017, J. Mach. Learn. Res..

[33]  Guy Gilboa,et al.  Nonlinear Eigenproblems in Image Processing and Computer Vision , 2018, Advances in Computer Vision and Pattern Recognition.

[34]  Yunjin Chen,et al.  Insights Into Analysis Operator Learning: From Patch-Based Sparse Models to Higher Order MRFs , 2014, IEEE Transactions on Image Processing.

[35]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[36]  N. G. Parke,et al.  Ordinary Differential Equations. , 1958 .

[37]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[38]  Thomas Pock,et al.  Learning a variational network for reconstruction of accelerated MRI data , 2017, Magnetic resonance in medicine.

[39]  Ye Zhang,et al.  On the second-order asymptotical regularization of linear ill-posed inverse problems , 2018, Applicable Analysis.

[40]  Carola-Bibiane Schönlieb,et al.  Deep learning as optimal control problems: models and numerical methods , 2019, Journal of Computational Dynamics.

[41]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[42]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[43]  Martin J. Wainwright,et al.  Early stopping for non-parametric regression: An optimal data-dependent stopping rule , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[44]  Ivan P. Gavrilyuk,et al.  Lagrange multiplier approach to variational problems and applications , 2010, Math. Comput..

[45]  E Weinan,et al.  A Proposal on Machine Learning via Dynamical Systems , 2017, Communications in Mathematics and Statistics.

[46]  Lorenzo Rosasco,et al.  Don't relax: early stopping for convex regularization , 2017, ArXiv.

[47]  Barbara Kaltenbacher,et al.  Iterative Regularization Methods for Nonlinear Ill-Posed Problems , 2008, Radon Series on Computational and Applied Mathematics.

[48]  T. Sideris Ordinary Differential Equations and Dynamical Systems , 2013 .

[49]  Michael J. Black,et al.  Fields of Experts , 2009, International Journal of Computer Vision.

[50]  Oliver Lundqvist,et al.  Numerical Methods for Ordinary Differential Equations , 2013, An Introduction to Numerical Methods and Analysis 3e.