Training of deep residual networks with stochastic MG/OPT

We train deep residual networks with a stochastic variant of the nonlinear multigrid method MG/OPT. To build the multilevel hierarchy, we use the dynamical systems viewpoint specific to residual networks. We report significant speedups and additional robustness for training MNIST on deep residual networks. Our numerical experiments also indicate that multilevel training can be used as a pruning technique, as many of the auxiliary networks have accuracies comparable to the original network.

[1]  Mark Sandler,et al.  The Power of Sparsity in Convolutional Neural Networks , 2017, ArXiv.

[2]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[5]  Song Han,et al.  DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow , 2016, ArXiv.

[6]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[7]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[8]  R. P. Fedorenko A relaxation method for solving elliptic difference equations , 1962 .

[9]  K. St A review of algebraic multigrid , 2001 .

[10]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[11]  D. Brandt,et al.  Multi-level adaptive solutions to boundary-value problems math comptr , 1977 .

[12]  Sam Greydanus,et al.  Scaling *down* Deep Learning , 2020, ArXiv.

[13]  C. Brezinski,et al.  Numerical Analysis: Historical Developments in the 20th Century , 2001 .

[14]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[15]  Frederick Tung,et al.  Multi-level Residual Networks from Dynamical Systems View , 2017, ICLR.

[16]  E Weinan,et al.  A Proposal on Machine Learning via Dynamical Systems , 2017, Communications in Mathematics and Statistics.

[17]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[18]  Alena Kopanicáková,et al.  Multilevel Minimization for Deep Residual Networks , 2020, ESAIM: Proceedings and Surveys.

[19]  Greg Mori,et al.  Deep Neural Network Compression by In-Parallel Pruning-Quantization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jacob B. Schroder,et al.  Multilevel Initialization for Layer-Parallel Deep Neural Network Training , 2019, ArXiv.

[21]  S. Nash A multigrid approach to discretized optimization problems , 2000 .

[22]  Lars Ruthotto,et al.  Learning Across Scales - Multiscale Methods for Convolution Neural Networks , 2018, AAAI.