On improving deep learning generalization with adaptive sparse connectivity

Large neural networks are very successful in various tasks. However, with limited data, the generalization capabilities of deep neural networks are also very limited. In this paper, we empirically start showing that intrinsically sparse neural networks with adaptive sparse connectivity, which by design have a strict parameter budget during the training phase, have better generalization capabilities than their fully-connected counterparts. Besides this, we propose a new technique to train these sparse models by combining the Sparse Evolutionary Training (SET) procedure with neurons pruning. Operated on MultiLayer Perceptron (MLP) and tested on 15 datasets, our proposed technique zeros out around 50% of the hidden neurons during training, while having a linear number of parameters to optimize with respect to the number of neurons. The results show a competitive classification and generalization performance.

[1]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[2]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[3]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[4]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[5]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[6]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Xin Wang,et al.  Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization , 2019, ICML.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Mykola Pechenizkiy,et al.  Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware , 2019, Neural Computing and Applications.

[11]  Qinghe Zheng,et al.  Improvement of Generalization Ability of Deep CNN via Implicit Regularization in Two-Stage Training Process , 2018, IEEE Access.

[12]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[13]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[14]  Peter Stone,et al.  Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science , 2017, Nature Communications.

[15]  Christopher Ré,et al.  Low-Memory Neural Network Training: A Technical Report , 2019, ArXiv.

[16]  Yaochu Jin,et al.  Multi-Objective Evolutionary Federated Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[17]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[18]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[19]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.