论文信息 - Parseval Proximal Neural Networks

Parseval Proximal Neural Networks

The aim of this paper is twofold. First, we show that a certain concatenation of a proximity operator with an affine operator is again a proximity operator on a suitable Hilbert space. Second, we use our findings to establish so-called proximal neural networks (PNNs) and stable tight frame proximal neural networks. Let $$\mathcal {H}$$ H and $$\mathcal {K}$$ K be real Hilbert spaces, $$b \in \mathcal {K}$$ b ∈ K and $$T \in \mathcal {B} (\mathcal {H},\mathcal {K})$$ T ∈ B ( H , K ) a linear operator with closed range and Moore–Penrose inverse $$T^\dagger $$ T † . Based on the well-known characterization of proximity operators by Moreau, we prove that for any proximity operator $$\mathrm {Prox}:\mathcal {K}\rightarrow \mathcal {K}$$ Prox : K → K the operator $$T^\dagger \, \mathrm {Prox}( T \cdot + b)$$ T † Prox ( T · + b ) is a proximity operator on $$\mathcal {H}$$ H equipped with a suitable norm. In particular, it follows for the frequently applied soft shrinkage operator $$\mathrm {Prox}= S_{\lambda }:\ell _2 \rightarrow \ell _2$$ Prox = S λ : ℓ 2 → ℓ 2 and any frame analysis operator $$T:\mathcal {H}\rightarrow \ell _2$$ T : H → ℓ 2 that the frame shrinkage operator $$T^\dagger \, S_\lambda \, T$$ T † S λ T is a proximity operator on a suitable Hilbert space. The concatenation of proximity operators on $$\mathbb R^d$$ R d equipped with different norms establishes a PNN. If the network arises from tight frame analysis or synthesis operators, then it forms an averaged operator. In particular, it has Lipschitz constant 1 and belongs to the class of so-called Lipschitz networks, which were recently applied to defend against adversarial attacks. Moreover, due to its averaging property, PNNs can be used within so-called Plug-and-Play algorithms with convergence guarantee. In case of Parseval frames, we call the networks Parseval proximal neural networks (PPNNs). Then, the involved linear operators are in a Stiefel manifold and corresponding minimization methods can be applied for training of such networks. Finally, some proof-of-the concept examples demonstrate the performance of PPNNs.

[1] Michael Moeller,et al. Energy Dissipation with Plug-and-Play Priors , 2019 .

[2] O. Christensen. An introduction to frames and Riesz bases , 2002 .

[3] Shotaro Akaho,et al. Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold , 2005, Neurocomputing.

[4] Émilie Chouzenoux,et al. Variable Metric Forward–Backward Algorithm for Minimizing the Sum of a Differentiable Function and a Convex Function , 2013, Journal of Optimization Theory and Applications.

[5] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[6] Dimitri P. Bertsekas,et al. Incremental proximal methods for large scale convex optimization , 2011, Math. Program..

[7] Brendt Wohlberg,et al. An Online Plug-and-Play Algorithm for Regularized Image Reconstruction , 2018, IEEE Transactions on Computational Imaging.

[8] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[9] Simon Setzer,et al. Operator Splittings, Bregman Methods and Frame Shrinkage in Image Processing , 2011, International Journal of Computer Vision.

[10] S. Setzer,et al. On the rotational invariant L1-norm PCA , 2020 .

[11] Michael Elad,et al. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[12] Gabriele Steidl,et al. First order algorithms in variational image processing , 2014, ArXiv.

[13] Stéphane Mallat,et al. A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[14] Thomas Brox,et al. On the Equivalence of Soft Wavelet Shrinkage, Total Variation Diffusion, Total Variation Regularization, and SIDEs , 2004, SIAM J. Numer. Anal..

[15] Cem Anil,et al. Sorting out Lipschitz function approximation , 2018, ICML.

[16] Xiaohan Chen,et al. Can We Gain More from Orthogonality Regularizations in Training Deep CNNs? , 2018, NeurIPS.

[17] S. Reich. Weak convergence theorems for nonexpansive mappings in Banach spaces , 1979 .

[18] Masashi Sugiyama,et al. Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.

[19] Gilad Lerman,et al. An Overview of Robust Subspace Recovery , 2018, Proceedings of the IEEE.

[20] Michael Möller,et al. Proximal Backpropagation , 2017, ICLR.

[21] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[22] Mario Lezcano Casado,et al. Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group , 2019, ICML.

[23] G. Plonka,et al. Frame Soft Shrinkage Operators are Proximity Operators , 2019, Applied and Computational Harmonic Analysis.

[24] Patrick L. Combettes,et al. Monotone operator theory in convex optimization , 2018, Math. Program..

[25] Amir Beck,et al. First-Order Methods in Optimization , 2017 .

[26] Les E. Atlas,et al. Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[27] Martin Vetterli,et al. Oversampled filter banks , 1998, IEEE Trans. Signal Process..

[28] Mila Nikolova,et al. A Characterization of Proximity Operators , 2018, Journal of Mathematical Imaging and Vision.

[29] I. Daubechies,et al. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[30] Patrick L. Combettes,et al. Deep Neural Network Structures Solving Variational Inequalities , 2018, Set-Valued and Variational Analysis.

[31] Victor D. Dorobantu,et al. DizzyRNN: Reparameterizing Recurrent Neural Networks for Norm-Preserving Backpropagation , 2016, ArXiv.

[32] Basura Fernando,et al. Generalized BackPropagation, Étude De Cas: Orthogonality , 2016, ArXiv.

[33] Yoshua Bengio,et al. Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[34] Christopher Joseph Pal,et al. On orthogonality and learning recurrent networks with long term dependencies , 2017, ICML.

[35] Bernhard Pfahringer,et al. Regularisation of neural networks by enforcing Lipschitz continuity , 2018, Machine Learning.

[36] Gabriele Steidl,et al. A Multiscale Wavelet-Inspired Scheme for Nonlinear Diffusion , 2006, Int. J. Wavelets Multiresolution Inf. Process..

[37] Ritu Chadha,et al. Limitations of the Lipschitz constant as a defense against adversarial examples , 2018, Nemesis/UrbReas/SoGood/IWAISe/GDM@PKDD/ECML.

[38] José M. Bioucas-Dias,et al. A Convergent Image Fusion Algorithm Using Scene-Adapted Gaussian-Mixture-Based Denoising , 2019, IEEE Transactions on Image Processing.

[39] Patrick L. Combettes,et al. Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[40] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[41] Xianglong Liu,et al. Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks , 2017, AAAI.

[42] Thomas Pock,et al. Variational Networks: Connecting Variational Methods and Deep Learning , 2017, GCPR.

[43] Zhangyang Wang,et al. Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , 2018, NeurIPS.

[44] Yann LeCun,et al. Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs , 2016, ICML.

[45] Levent Tunçel,et al. Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[46] Philip M. Long,et al. The Singular Values of Convolutional Layers , 2018, ICLR.

[47] Stanley H. Chan,et al. Plug-and-Play ADMM for Image Restoration: Fixed-Point Convergence and Applications , 2016, IEEE Transactions on Computational Imaging.

[48] Charles A. Bouman,et al. Plug-and-Play Priors for Bright Field Electron Tomography and Sparse Interpolation , 2015, IEEE Transactions on Computational Imaging.

[49] Wotao Yin,et al. A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[50] J. Moreau. Proximité et dualité dans un espace hilbertien , 1965 .

[51] S. Mallat. A wavelet tour of signal processing , 1998 .