Tensor Contraction Layers for Parsimonious Deep Nets

Tensors offer a natural representation for many kinds of data frequently encountered in machine learning. Images, for example, are naturally represented as third order tensors, where the modes correspond to height, width, and channels. In particular, tensor decompositions are noted for their ability to discover multi-dimensional dependencies and produce compact low-rank approximations of data. In this paper, we explore the use of tensor contractions as neural network layers and investigate several ways to apply them to activation tensors. Specifically, we propose the Tensor Contraction Layer (TCL), the first attempt to incorporate tensor contractions as end-to-end trainable neural network layers. Applied to existing networks, TCLs reduce the dimensionality of the activation tensors and thus the number of model parameters. We evaluate the TCL on the task of image recognition, augmenting popular networks (AlexNet, VGG). The resulting models are trainable end-to-end. We evaluate TCL's performance on the task of image recognition, using the CIFAR100 and ImageNet datasets, studying the effect of parameter reduction via tensor contraction on performance. We demonstrate significant model compression without significant impact on the accuracy and, in some cases, improved performance.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[3]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[4]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[5]  Shuicheng Yan,et al.  Sharing Residual Units Through Collective Tensor Factorization To Improve Deep Neural Networks , 2018, IJCAI.

[6]  Nuria Oliver,et al.  Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering , 2010, RecSys '10.

[7]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[8]  Anima Anandkumar,et al.  Generalization Bounds for Neural Networks through Tensor Factorization , 2015, ArXiv.

[9]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[10]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12]  René Vidal,et al.  Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.

[13]  Chen Yunpeng,et al.  Sharing Residual Units Through Collective Tensor Factorization in Deep Neural Networks , 2017, ArXiv.

[14]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[15]  U. N. Niranjan,et al.  Tensor Contractions with Extended BLAS Kernels on CPU and GPU , 2016, HiPC 2016.

[16]  M. Irani Vision Day Schedule Time Speaker and Collaborators Affiliation Title a General Preprocessing Method for Improved Performance of Epipolar Geometry Estimation Algorithms on the Expressive Power of Deep Learning: a Tensor Analysis , 2016 .

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Yongxin Yang,et al.  Deep Multi-task Representation Learning: A Tensor Factorisation Approach , 2016, ICLR.

[19]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[20]  Anima Anandkumar,et al.  Training Input-Output Recurrent Neural Networks through Spectral Methods , 2016, ArXiv.

[21]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.