ProdSumNet: reducing model parameters in deep neural networks via product-of-sums matrix decompositions

We consider a general framework for reducing the number of trainable model parameters in deep learning networks by decomposing linear operators as a product of sums of simpler linear operators. Recently proposed deep learning architectures such as CNN, KFC, Dilated CNN, etc. are all subsumed in this framework and we illustrate other types of neural network architectures within this framework. We show that good accuracy on MNIST and Fashion MNIST can be obtained using a relatively small number of trainable parameters. In addition, since implementation of the convolutional layer is resource-heavy, we consider an approach in the transform domain that obviates the need for convolutional layers. One of the advantages of this general framework over prior approaches is that the number of trainable parameters is not fixed and can be varied arbitrarily. In particular, we illustrate the tradeoff of varying the number of trainable variables and the corresponding error rate. As an example, by using this decomposition on a reference CNN architecture for MNIST with over 3x10^6 trainable parameters, we are able to obtain an accuracy of 98.44% using only 3554 trainable parameters.

[1]  Frans Coenen,et al.  FCNN: Fourier Convolutional Neural Networks , 2017, ECML/PKDD.

[2]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[4]  A. Conv A Kronecker-factored approximate Fisher matrix for convolution layers , 2016 .

[5]  Inge Gavat,et al.  Deep learning in acoustic modeling for Automatic Speech Recognition and Understanding - an overview - , 2015, 2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD).

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[8]  M. Kilmer,et al.  Factorization strategies for third-order tensors , 2011 .

[9]  Yifan Gong,et al.  Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.

[10]  Joan L. Mitchell,et al.  JPEG: Still Image Data Compression Standard , 1992 .

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Nikolaos Doulamis,et al.  Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..

[13]  Lin-Bao Yang,et al.  Cellular neural networks: theory , 1988 .

[14]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Chao Wang,et al.  CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[17]  Chai Wah Wu Locally connected processor arrays for matrix multiplication and linear transforms , 2011, 2011 IEEE International Symposium of Circuits and Systems (ISCAS).

[18]  M. Kilmer,et al.  Tensor-Tensor Products with Invertible Linear Transforms , 2015 .

[19]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[20]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[21]  Yi Luo,et al.  All-optical machine learning using diffractive deep neural networks , 2018, Science.

[22]  Sanjoy Dasgupta,et al.  Experiments with Random Projection , 2000, UAI.

[23]  J. Goodman,et al.  A technique for optically convolving two functions. , 1966, Applied optics.

[24]  Yanzhi Wang,et al.  Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank , 2017, ICML.