Parseval Proximal Neural Networks

The aim of this paper is twofold. First, we show that a certain concatenation of a proximity operator with an affine operator is again a proximity operator on a suitable Hilbert space. Second, we use our findings to establish so-called proximal neural networks (PNNs) and stable tight frame proximal neural networks. Let $$\mathcal {H}$$ H and $$\mathcal {K}$$ K be real Hilbert spaces, $$b \in \mathcal {K}$$ b ∈ K and $$T \in \mathcal {B} (\mathcal {H},\mathcal {K})$$ T ∈ B ( H , K ) a linear operator with closed range and Moore–Penrose inverse $$T^\dagger $$ T † . Based on the well-known characterization of proximity operators by Moreau, we prove that for any proximity operator $$\mathrm {Prox}:\mathcal {K}\rightarrow \mathcal {K}$$ Prox : K → K the operator $$T^\dagger \, \mathrm {Prox}( T \cdot + b)$$ T † Prox ( T · + b ) is a proximity operator on $$\mathcal {H}$$ H equipped with a suitable norm. In particular, it follows for the frequently applied soft shrinkage operator $$\mathrm {Prox}= S_{\lambda }:\ell _2 \rightarrow \ell _2$$ Prox = S λ : ℓ 2 → ℓ 2 and any frame analysis operator $$T:\mathcal {H}\rightarrow \ell _2$$ T : H → ℓ 2 that the frame shrinkage operator $$T^\dagger \, S_\lambda \, T$$ T † S λ T is a proximity operator on a suitable Hilbert space. The concatenation of proximity operators on $$\mathbb R^d$$ R d equipped with different norms establishes a PNN. If the network arises from tight frame analysis or synthesis operators, then it forms an averaged operator. In particular, it has Lipschitz constant 1 and belongs to the class of so-called Lipschitz networks, which were recently applied to defend against adversarial attacks. Moreover, due to its averaging property, PNNs can be used within so-called Plug-and-Play algorithms with convergence guarantee. In case of Parseval frames, we call the networks Parseval proximal neural networks (PPNNs). Then, the involved linear operators are in a Stiefel manifold and corresponding minimization methods can be applied for training of such networks. Finally, some proof-of-the concept examples demonstrate the performance of PPNNs.

[1]  Michael Moeller,et al.  Energy Dissipation with Plug-and-Play Priors , 2019 .

[2]  O. Christensen An introduction to frames and Riesz bases , 2002 .

[3]  Shotaro Akaho,et al.  Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold , 2005, Neurocomputing.

[4]  Émilie Chouzenoux,et al.  Variable Metric Forward–Backward Algorithm for Minimizing the Sum of a Differentiable Function and a Convex Function , 2013, Journal of Optimization Theory and Applications.

[5]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[6]  Dimitri P. Bertsekas,et al.  Incremental proximal methods for large scale convex optimization , 2011, Math. Program..

[7]  Brendt Wohlberg,et al.  An Online Plug-and-Play Algorithm for Regularized Image Reconstruction , 2018, IEEE Transactions on Computational Imaging.

[8]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[9]  Simon Setzer,et al.  Operator Splittings, Bregman Methods and Frame Shrinkage in Image Processing , 2011, International Journal of Computer Vision.

[10]  S. Setzer,et al.  On the rotational invariant L1-norm PCA , 2020 .

[11]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[12]  Gabriele Steidl,et al.  First order algorithms in variational image processing , 2014, ArXiv.

[13]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[14]  Thomas Brox,et al.  On the Equivalence of Soft Wavelet Shrinkage, Total Variation Diffusion, Total Variation Regularization, and SIDEs , 2004, SIAM J. Numer. Anal..

[15]  Cem Anil,et al.  Sorting out Lipschitz function approximation , 2018, ICML.

[16]  Xiaohan Chen,et al.  Can We Gain More from Orthogonality Regularizations in Training Deep CNNs? , 2018, NeurIPS.

[17]  S. Reich Weak convergence theorems for nonexpansive mappings in Banach spaces , 1979 .

[18]  Masashi Sugiyama,et al.  Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.

[19]  Gilad Lerman,et al.  An Overview of Robust Subspace Recovery , 2018, Proceedings of the IEEE.

[20]  Michael Möller,et al.  Proximal Backpropagation , 2017, ICLR.

[21]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[22]  Mario Lezcano Casado,et al.  Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group , 2019, ICML.

[23]  G. Plonka,et al.  Frame Soft Shrinkage Operators are Proximity Operators , 2019, Applied and Computational Harmonic Analysis.

[24]  Patrick L. Combettes,et al.  Monotone operator theory in convex optimization , 2018, Math. Program..

[25]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[26]  Les E. Atlas,et al.  Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[27]  Martin Vetterli,et al.  Oversampled filter banks , 1998, IEEE Trans. Signal Process..

[28]  Mila Nikolova,et al.  A Characterization of Proximity Operators , 2018, Journal of Mathematical Imaging and Vision.

[29]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[30]  Patrick L. Combettes,et al.  Deep Neural Network Structures Solving Variational Inequalities , 2018, Set-Valued and Variational Analysis.

[31]  Victor D. Dorobantu,et al.  DizzyRNN: Reparameterizing Recurrent Neural Networks for Norm-Preserving Backpropagation , 2016, ArXiv.

[32]  Basura Fernando,et al.  Generalized BackPropagation, Étude De Cas: Orthogonality , 2016, ArXiv.

[33]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[34]  Christopher Joseph Pal,et al.  On orthogonality and learning recurrent networks with long term dependencies , 2017, ICML.

[35]  Bernhard Pfahringer,et al.  Regularisation of neural networks by enforcing Lipschitz continuity , 2018, Machine Learning.

[36]  Gabriele Steidl,et al.  A Multiscale Wavelet-Inspired Scheme for Nonlinear Diffusion , 2006, Int. J. Wavelets Multiresolution Inf. Process..

[37]  Ritu Chadha,et al.  Limitations of the Lipschitz constant as a defense against adversarial examples , 2018, Nemesis/UrbReas/SoGood/IWAISe/GDM@PKDD/ECML.

[38]  José M. Bioucas-Dias,et al.  A Convergent Image Fusion Algorithm Using Scene-Adapted Gaussian-Mixture-Based Denoising , 2019, IEEE Transactions on Image Processing.

[39]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[40]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[41]  Xianglong Liu,et al.  Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks , 2017, AAAI.

[42]  Thomas Pock,et al.  Variational Networks: Connecting Variational Methods and Deep Learning , 2017, GCPR.

[43]  Zhangyang Wang,et al.  Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , 2018, NeurIPS.

[44]  Yann LeCun,et al.  Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs , 2016, ICML.

[45]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[46]  Philip M. Long,et al.  The Singular Values of Convolutional Layers , 2018, ICLR.

[47]  Stanley H. Chan,et al.  Plug-and-Play ADMM for Image Restoration: Fixed-Point Convergence and Applications , 2016, IEEE Transactions on Computational Imaging.

[48]  Charles A. Bouman,et al.  Plug-and-Play Priors for Bright Field Electron Tomography and Sparse Interpolation , 2015, IEEE Transactions on Computational Imaging.

[49]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[50]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[51]  S. Mallat A wavelet tour of signal processing , 1998 .