Learning Across Scales - Multiscale Methods for Convolution Neural Networks

In this work we establish the relation between optimal control and training deep Convolution Neural Networks (CNNs). We show that the forward propagation in CNNs can be interpreted as a time-dependent nonlinear differential equation and learning as controlling the parameters of the differential equation such that the network approximates the data-label relation for given training data. Using this continuous interpretation we derive two new methods to scale CNNs with respect to two different dimensions. The first class of multiscale methods connects low-resolution and high-resolution data through prolongation and restriction of CNN parameters. We demonstrate that this enables classifying high-resolution images using CNNs trained with low-resolution images and vice versa and warm-starting the learning process. The second class of multiscale methods connects shallow and deep networks and leads to new training strategies that gradually increase the depths of the CNN while re-using parameters for initializations.

[1]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[2]  P. Deuflhard,et al.  The cascadic multigrid method for elliptic problems , 1996 .

[3]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[4]  Ping-Sing Tsai,et al.  Shape from Shading: A Survey , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Tony F. Chan,et al.  A Multiphase Level Set Framework for Image Segmentation Using the Mumford and Shah Model , 2002, International Journal of Computer Vision.

[6]  Jan Modersitzki,et al.  Numerical Methods for Image Registration , 2004 .

[7]  M. Heinkenschloss,et al.  Real-Time PDE-Constrained Optimization , 2007 .

[8]  Jan Modersitzki,et al.  FAIR: Flexible Algorithms for Image Registration , 2009 .

[9]  J Jiang,et al.  Medical image analysis with artificial neural networks , 2010, Comput. Medical Imaging Graph..

[10]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[11]  Ernst Hairer,et al.  Numerical methods for evolutionary differential equations , 2010, Math. Comput..

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  M. Warner,et al.  Anisotropic 3D full-waveform inversion , 2013 .

[14]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[19]  Michael W. Mahoney,et al.  Sub-Sampled Newton Methods II: Local Convergence Rates , 2016, ArXiv.

[20]  Matthew W. Hoffman,et al.  A General Framework for Constrained Bayesian Optimization using Information-based Search , 2015, J. Mach. Learn. Res..

[21]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[22]  Eldad Haber,et al.  Reversible Architectures for Arbitrarily Deep Residual Neural Networks , 2017, AAAI.

[23]  Frederick Tung,et al.  Multi-level Residual Networks from Dynamical Systems View , 2017, ICLR.

[24]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[25]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.