Learning both Weights and Connections for Efficient Neural Network

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9x, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the number of parameters can be reduced by 13x, from 138 million to 10.3 million, again with no loss of accuracy.

[1]  J. Rauschecker Neuronal mechanisms of developmental plasticity in the cat's visual system. , 1984, Human neurobiology.

[2]  Rauschecker Jp Neuronal mechanisms of developmental plasticity in the cat's visual system. , 1984 .

[3]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[4]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[5]  Hans Henrik Thodberg,et al.  Improving Generalization of Neural Networks Through Pruning , 1991, Int. J. Neural Syst..

[6]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[7]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[10]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[11]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[12]  John Langford,et al.  Hash Kernels for Structured Data , 2009, J. Mach. Learn. Res..

[13]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[14]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[17]  Christopher A. Walsh,et al.  Peter Huttenlocher (1931–2013) , 2013, Nature.

[18]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[19]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[20]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[21]  Pushmeet Kohli,et al.  Memory Bounded Deep Convolutional Networks , 2014, ArXiv.

[22]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[27]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[28]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[30]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[33]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.