Spectral Tensor Train Parameterization of Deep Learning Layers

We study low-rank parameterizations of weight matrices with embedded spectral properties in the Deep Learning context. The low-rank property leads to parameter efficiency and permits taking computational shortcuts when computing mappings. Spectral properties are often subject to constraints in optimization problems, leading to better models and stability of optimization. We start by looking at the compact SVD parameterization of weight matrices and identifying redundancy sources in the parameterization. We further apply the Tensor Train (TT) decomposition to the compact SVD components, and propose a non-redundant differentiable parameterization of fixed TT-rank tensor manifolds, termed the Spectral Tensor Train Parameterization (STTP). We demonstrate the effects of neural network compression in the image classification setting and both compression and improved training stability in the generative adversarial training setting.

[1]  Feng Liu,et al.  On Computation and Generalization of Generative Adversarial Networks under Spectrum Control , 2019, ICLR.

[2]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[3]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[4]  Wenming Tang,et al.  Spectral Regularization for Combating Mode Collapse in GANs , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  L. Gool,et al.  T-Basis: a Compact Representation for Neural Networks , 2020, ICML.

[6]  Jing Huang,et al.  Improving Neural Language Generation with Spectrum Control , 2020, ICLR.

[7]  Volker Tresp,et al.  Tensor-Train Recurrent Neural Networks for Video Classification , 2017, ICML.

[8]  David S. Touretzky,et al.  Advances in neural information processing systems 2 , 1989 .

[9]  Andrzej Cichocki,et al.  Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network , 2020, ECCV.

[10]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  R. Shepard,et al.  The Representation and Parametrization of Orthogonal Matrices. , 2015, The journal of physical chemistry. A.

[12]  Ina Ruck,et al.  USA , 1969, The Lancet.

[13]  Inderjit S. Dhillon,et al.  Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization , 2018, ICML.

[14]  Alexander Novikov,et al.  Ultimate tensorization: compressing convolutional and FC layers alike , 2016, ArXiv.

[15]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[16]  Philip H. S. Torr,et al.  Stable Rank Normalization for Improved Generalization in Neural Networks and GANs , 2019, ICLR.

[17]  Yoshua Bengio,et al.  Mode Regularized Generative Adversarial Networks , 2016, ICLR.

[18]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[19]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[20]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[21]  Yifan Sun,et al.  Wide Compression: Tensor Ring Nets , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Youssef M. Marzouk,et al.  Spectral Tensor-Train Decomposition , 2014, SIAM J. Sci. Comput..

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[25]  Philip M. Long,et al.  The Singular Values of Convolutional Layers , 2018, ICLR.

[26]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[27]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[28]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[29]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[30]  Johnnie Gray,et al.  opt\_einsum - A Python package for optimizing contraction order for einsum-like expressions , 2018, J. Open Source Softw..

[31]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[32]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[33]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[36]  Luc Van Gool,et al.  Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference , 2020, ECCV.

[37]  Reinhold Schneider,et al.  On manifolds of tensors of fixed TT-rank , 2012, Numerische Mathematik.

[38]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[39]  Mario Lezcano Casado,et al.  Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group , 2019, ICML.

[40]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.