Pruned and Structurally Sparse Neural Networks

Advances in designing and training deep neural networks have led to the principle that the large and deeper a network is, the better it can perform. As a result, computational resources have become a key limiting factor in achieving better performance. One strategy to improve network capabilities while decreasing computation required is to replace dense fully-connected and convolutional layers with sparse layers. In this paper we experiment with training on sparse neural network topologies. First, we test pruning-based sparse topologies, which use a network topology obtained by initially training a dense network and then pruning low-weight connections. Second, we test RadiX-Nets, a class of sparse network structures with proven connectivity and sparsity properties. Results show that compared to dense topologies, sparse structures show promise in training potential but also can exhibit highly nonlinear convergence, which merits further study.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ameya Prabhu,et al.  Deep Expander Networks: Efficient Deep Networks from Graph Theory , 2017, ECCV.

[3]  Tinkara Toš,et al.  Graph Algorithms in the Language of Linear Algebra , 2012, Software, environments, tools.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[7]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jeremy Kepner,et al.  Neural Network Topologies for Sparse Training , 2018, 2018 IEEE MIT Undergraduate Research Technology Conference (URTC).

[10]  Paris Smaragdis,et al.  A Simple yet Effective Method to Prune Dense Layers of Neural Networks , 2017 .

[11]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[12]  José E. Moreira,et al.  IBM POWER9 and cognitive computing , 2018, IBM J. Res. Dev..

[13]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[14]  José E. Moreira,et al.  Enabling massive deep neural networks with the GraphBLAS , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[15]  William Song,et al.  GraphChallenge.org: Raising the Bar on Graph Analytic Performance , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[16]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[19]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Frederico A. C. Azevedo,et al.  Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled‐up primate brain , 2009, The Journal of comparative neurology.

[21]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[22]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.