Network pruning has emerged as a powerful technique for reducing the size of deep neural networks. Pruning uncovers high-performance subnetworks by taking a trained dense network and gradually removing unimportant connections. Recently, alternative techniques have emerged for training sparse networks directly without having to train a large dense model beforehand, thereby achieving small memory footprints during both training and inference. These techniques are based on dynamic reallocation of non-zero parameters during training. Thus, they are in effect executing a training-time search for the optimal subnetwork. We investigate a most recent one of these techniques and conduct additional experiments to elucidate its behavior in training sparse deep convolutional networks. Dynamic parameter reallocation converges early during training to a highly trainable subnetwork. We show that neither the structure, nor the initialization of the discovered highperformance subnetwork is sufficient to explain its good performance. Rather, it is the dynamics of parameter reallocation that are responsible for successful learning. Dynamic parameter reallocation thus improves the trainability of deep convolutional networks, playing a similar role as overparameterization, without incurring the memory and computational cost of the latter.
[1]
Suyog Gupta,et al.
To prune, or not to prune: exploring the efficacy of pruning for model compression
,
2017,
ICLR.
[2]
Xin Wang,et al.
Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization
,
2019,
ICML.
[3]
Michael Carbin,et al.
The Lottery Ticket Hypothesis: Training Pruned Neural Networks
,
2018,
ArXiv.
[4]
Peter Stone,et al.
Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science
,
2017,
Nature Communications.
[5]
Quoc V. Le,et al.
Understanding and Simplifying One-Shot Architecture Search
,
2018,
ICML.
[6]
David Kappel,et al.
Deep Rewiring: Training very sparse deep networks
,
2017,
ICLR.
[7]
Nikos Komodakis,et al.
Wide Residual Networks
,
2016,
BMVC.
[8]
Frank Hutter,et al.
Neural Architecture Search: A Survey
,
2018,
J. Mach. Learn. Res..