Translating Diffusion, Wavelets, and Regularisation into Residual Networks

Convolutional neural networks (CNNs) often perform well, but their stability is poorly understood. To address this problem, we consider the simple prototypical problem of signal denoising, where classical approaches such as nonlinear diffusion, wavelet-based methods and regularisation offer provable stability guarantees. To transfer such guarantees to CNNs, we interpret numerical approximations of these classical methods as a specific residual network (ResNet) architecture. This leads to a dictionary which allows to translate diffusivities, shrinkage functions, and regularisers into activation functions, and enables a direct communication between the four research communities. On the CNN side, it does not only inspire new families of nonmonotone activation functions, but also introduces intrinsically stable architectures for an arbitrary number of layers.

[1]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[2]  Edmund Taylor Whittaker On a New Method of Graduation , 1922, Proceedings of the Edinburgh Mathematical Society.

[3]  Robert Li,et al.  Wavelet Pooling for Convolutional Neural Networks , 2018, ICLR.

[4]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[5]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[6]  René Vidal,et al.  Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Joan Bruna,et al.  Mathematics of Deep Learning , 2017, ArXiv.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Joachim Weickert,et al.  Relations Between Regularization and Diffusion Filtering , 2000, Journal of Mathematical Imaging and Vision.

[10]  Stefano Soatto,et al.  Deep relaxation: partial differential equations for optimizing deep neural networks , 2017, Research in the Mathematical Sciences.

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Linan Zhang,et al.  Forward Stability of ResNet and Its Variants , 2018, Journal of Mathematical Imaging and Vision.

[13]  Thomas Wiegand,et al.  Towards efficient intra prediction based on image inpainting methods , 2010, 28th Picture Coding Symposium.

[14]  Tobias Kluth,et al.  Regularization by Architecture: A Deep Prior Approach for Inverse Problems , 2019, Journal of Mathematical Imaging and Vision.

[15]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Pascal Fernsel,et al.  Analysis of Invariance and Robustness via Invertibility of ReLU-Networks , 2018, ArXiv.

[18]  Eldad Haber,et al.  Reversible Architectures for Arbitrarily Deep Residual Neural Networks , 2017, AAAI.

[19]  I. Schoenberg Über variationsvermindernde lineare Transformationen , 1930 .

[20]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[21]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[22]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[23]  P. J. Huber Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .

[24]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[25]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[26]  Simon R. Arridge,et al.  Networks for Nonlinear Diffusion Problems in Imaging , 2018, Journal of Mathematical Imaging and Vision.

[27]  Tomer Michaeli,et al.  xUnit: Learning a Spatial Activation Function for Efficient Image Restoration , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Joachim Weickert,et al.  Anisotropic diffusion in image processing , 1996 .

[29]  S. Mallat A wavelet tour of signal processing , 1998 .

[30]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[31]  Toshiya Hachisuka,et al.  Wavelet Convolutional Neural Networks , 2018, ArXiv.

[32]  Wei Yu,et al.  On learning optimized reaction diffusion processes for effective image restoration , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Karl Kunisch,et al.  Total Deep Variation for Linear Inverse Problems , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Michael Unser,et al.  A representer theorem for deep neural networks , 2018, J. Mach. Learn. Res..

[35]  Joachim Weickert,et al.  Scale-Space Properties of Nonstationary Iterative Regularization Methods , 2000, J. Vis. Commun. Image Represent..

[36]  Stephen L. Keeling,et al.  Nonlinear anisotropic diffusion filters for wide range edge sharpening , 2000, Medical Imaging: Image Processing.

[37]  Gerlind Plonka-Hoch,et al.  Combined Curvelet Shrinkage and Nonlinear Anisotropic Diffusion , 2007, IEEE Transactions on Image Processing.

[38]  Lorenzo Rosasco,et al.  Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.

[39]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[40]  Joachim Weickert,et al.  Diffusion-Inspired Shrinkage Functions and Stability Results for Wavelet Denoising , 2005, International Journal of Computer Vision.

[41]  H. Schaeffer,et al.  Learning partial differential equations via data discovery and sparse optimization , 2017, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[42]  David Rolnick,et al.  The power of deeper networks for expressing natural functions , 2017, ICLR.

[43]  Eldad Haber,et al.  Deep Neural Networks Motivated by Partial Differential Equations , 2018, Journal of Mathematical Imaging and Vision.

[44]  M. Thorpe,et al.  Deep limits of residual neural networks , 2018, Research in the Mathematical Sciences.

[45]  Yonina C. Eldar,et al.  Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing , 2019, IEEE Signal Processing Magazine.

[46]  Matthias Hein,et al.  The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.

[47]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[48]  Rémi Gribonval,et al.  Approximation Spaces of Deep Neural Networks , 2019, Constructive Approximation.

[49]  Bin Dong,et al.  Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations , 2017, ICML.

[50]  Isaac Meilijson,et al.  Optimal Signalling in Attractor Neural Networks , 1993, NIPS.

[51]  Joachim Weickert,et al.  A semidiscrete nonlinear scale-space theory and its relation to the Perona - Malik paradox , 1996, TFCV.

[52]  J. Weickert,et al.  Locally analytic schemes: A link between diffusion filtering and wavelet shrinkage , 2008 .

[53]  Stefan Roth,et al.  Shrinkage Fields for Effective Image Restoration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Bin Dong,et al.  PDE-Net 2.0: Learning PDEs from Data with A Numeric-Symbolic Hybrid Deep Network , 2018, J. Comput. Phys..

[55]  Bernard Ghanem,et al.  Deep Layers as Stochastic Solvers , 2018, International Conference on Learning Representations.

[56]  V. Caselles,et al.  Minimizing total variation flow , 2000, Differential and Integral Equations.

[57]  Carola-Bibiane Schönlieb,et al.  Bilevel Parameter Learning for Higher-Order Total Variation Regularisation Models , 2015, Journal of Mathematical Imaging and Vision.

[58]  Zhen Li,et al.  Deep Residual Learning and PDEs on Manifold , 2017, ArXiv.

[59]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Thomas Wiatowski,et al.  A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction , 2015, IEEE Transactions on Information Theory.

[61]  Daniel Cremers,et al.  Regularization for Deep Learning: A Taxonomy , 2017, ArXiv.

[62]  Remco Duits,et al.  PDE-based Group Equivariant Convolutional Neural Networks , 2020, ArXiv.

[63]  Yunjin Chen,et al.  Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Thomas Pock,et al.  Variational Networks: Connecting Variational Methods and Deep Learning , 2017, GCPR.

[65]  Hong-Ye Gao,et al.  Wavelet Shrinkage Denoising Using the Non-Negative Garrote , 1998 .

[66]  Michael Elad,et al.  Adversarial Noise Attacks of Deep Learning Architectures: Stability Analysis via Sparse-Modeled Signals , 2018, Journal of Mathematical Imaging and Vision.

[67]  D. Donoho,et al.  Translation-Invariant DeNoising , 1995 .

[68]  Michael Guerzhoy,et al.  Deep Neural Networks , 2013 .

[69]  Ronan Fablet,et al.  Residual Networks as Flows of Diffeomorphisms , 2019, Journal of Mathematical Imaging and Vision.

[70]  Adrien Gruson,et al.  Deep Adaptive Wavelet Network , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[71]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[72]  Michel Barlaud,et al.  Two deterministic half-quadratic regularization algorithms for computed imaging , 1994, Proceedings of 1st International Conference on Image Processing.

[73]  Guido Pasquariello,et al.  Dynamics of neural networks with nonmonotone activation function , 1993 .

[74]  Joachim Weickert,et al.  Learning a Generic Adaptive Wavelet Shrinkage Function for Denoising , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).