Holistic SparseCNN: Forging the Trident of Accuracy, Speed, and Size

We present Holistic SparseCNN, a sparse convolutional neural network design that simultaneously optimizes convolution layers (for classification speed) and fully connected layers (for model size), while maintaining the accuracy. We directly apply convolutions to tensors without bandwidth-wasting lowering step, which is critical for sparse convolution that is more prone to be bandwidth bound than its dense counterpart. Our cross-layer training method balances sparsity among multiple layers to optimize the trade-off between accuracy, speed, and model size, and it is guided by the characteristics of underlying computing platforms. We demonstrate overall classification throughputs significantly higher than the best published numbers on Intel Xeon and Atom processors, which represent datacenter servers and resource-constrained mobile platforms, respectively. We also open source our project at https://github.com/IntelLabs/SkimCaffe.

[1]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[2]  John R. Gilbert,et al.  On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[3]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[4]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[5]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  John R. Gilbert,et al.  Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.

[7]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[8]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[9]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[10]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[13]  Pushmeet Kohli,et al.  PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions , 2015, NIPS.

[14]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[15]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[16]  Jeff Johnson,et al.  Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.

[17]  Christopher R'e,et al.  Caffe con Troll: Shallow Ideas to Speed Up Deep Learning , 2015, DanaC@SIGMOD.

[18]  Andrew Lavin,et al.  Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[20]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[21]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.