Tensor Contraction & Regression Networks

Convolution neural networks typically consist of many convolutional layers followed by several fully-connected layers. While convolutional layers map between high-order activation tensors, the fully-connected layers operate on flattened activation vectors. Despite its success, this approach has notable drawbacks. Flattening discards the multi-dimensional structure of the activations, and the fully-connected layers require a large number of parameters. We present two new techniques to address these problems. First, we introduce tensor contraction layers which can replace the ordinary fully-connected layers in a neural network. Second, we introduce tensor regression layers, which express the output of a neural network as a low-rank multi-linear mapping from a high-order activation tensor to the softmax layer. Both the contraction and regression weights are learned end-to-end by backpropagation. By imposing low rank on both, we use significantly fewer parameters. Experiments on the ImageNet dataset show that applied to the popular VGG and ResNet architectures, our methods significantly reduce the number of parameters in the fully connected layers (about 65% space savings) while negligibly impacting accuracy.

[1]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[2]  Anima Anandkumar,et al.  Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .

[3]  Maja Pantic,et al.  TensorLy: Tensor Learning in Python , 2016, J. Mach. Learn. Res..

[4]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[5]  René Vidal,et al.  Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.

[6]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[7]  U. N. Niranjan,et al.  Tensor Contractions with Extended BLAS Kernels on CPU and GPU , 2016, HiPC 2016.

[8]  M. Irani Vision Day Schedule Time Speaker and Collaborators Affiliation Title a General Preprocessing Method for Improved Performance of Epipolar Geometry Estimation Algorithms on the Expressive Power of Deep Learning: a Tensor Analysis , 2016 .

[9]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[10]  Anima Anandkumar,et al.  Generalization Bounds for Neural Networks through Tensor Factorization , 2015, ArXiv.

[11]  Shuicheng Yan,et al.  Sharing Residual Units Through Collective Tensor Factorization in Deep Neural Networks , 2017, ArXiv.

[12]  Yongxin Yang,et al.  Deep Multi-task Representation Learning: A Tensor Factorisation Approach , 2016, ICLR.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Hachem Kadri,et al.  Low-Rank Regression with Tensor Responses , 2016, NIPS.

[16]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[17]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[20]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[21]  Hongtu Zhu,et al.  Tensor Regression with Applications in Neuroimaging Data Analysis , 2012, Journal of the American Statistical Association.

[22]  Rose Yu,et al.  Learning from Multiway Data: Simple and Efficient Tensor Regression , 2016, ICML.

[23]  Anima Anandkumar,et al.  Training Input-Output Recurrent Neural Networks through Spectral Methods , 2016, ArXiv.

[24]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[25]  Weiwei Guo,et al.  Tensor Learning for Regression , 2012, IEEE Transactions on Image Processing.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.