Exploiting Local Structures with the Kronecker Layer in Convolutional Networks

In this paper, we propose and study a technique to reduce the number of parameters and computation time in convolutional neural networks. We use Kronecker product to exploit the local structures within convolution and fully-connected layers, by replacing the large weight matrices by combinations of multiple Kronecker products of smaller matrices. Just as the Kronecker product is a generalization of the outer product from vectors to matrices, our method is a generalization of the low rank approximation method for convolution neural networks. We also introduce combinations of different shapes of Kronecker product to increase modeling capacity. Experiments on SVHN, scene text recognition and ImageNet dataset demonstrate that we can achieve $3.3 \times$ speedup or $3.6 \times$ parameter reduction with less than 1\% drop in accuracy, showing the effectiveness and efficiency of our method. Moreover, the computation efficiency of Kronecker layer makes using larger feature map possible, which in turn enables us to outperform the previous state-of-the-art on both SVHN(digit recognition) and CASIA-HWDB (handwritten Chinese character recognition) datasets.

[1]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[2]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Le Song,et al.  A la Carte - Learning Fast Kernels , 2014, AISTATS.

[4]  David Masip,et al.  Speeding Up Neural Networks for Large Scale Classification using WTA Hashing , 2015, CCIA.

[5]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[6]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[7]  Graham W. Taylor,et al.  Theano-based Large-Scale Visual Recognition with Multiple GPUs , 2014, ICLR.

[8]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[9]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[10]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Yifan Gong,et al.  Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[14]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[15]  C. Loan,et al.  Approximation with Kronecker Products , 1992 .

[16]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[18]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[19]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[20]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[21]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[22]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[24]  Lianwen Jin,et al.  High performance offline handwritten Chinese character recognition using GoogLeNet and directional feature maps , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[25]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[26]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[28]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[29]  Jian Sun,et al.  Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[31]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[32]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[33]  Vincent Lepetit,et al.  Learning Separable Filters , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[35]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[36]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[37]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[38]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[39]  Jun Sun,et al.  Handwritten Character Recognition by Alternately Trained Relaxation Convolutional Neural Network , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.