Spline parameterization of neural network controls for deep learning

Based on the continuous interpretation of deep learning cast as an optimal control problem, this paper investigates the benefits of employing B-spline basis functions to parameterize neural network controls across the layers. Rather than equipping each layer of a discretized ODE-network with a set of trainable weights, we choose a fixed number of B-spline basis functions whose coefficients are the trainable parameters of the neural network. Decoupling the trainable parameters from the layers of the neural network enables us to investigate and adapt the accuracy of the network propagation separated from the optimization learning problem. We numerically show that the spline-based neural network increases robustness of the learning problem towards hyperparameters due to increased stability and accuracy of the network propagation. Further, training on B-spline coefficients rather than layer weights directly enables a reduction in the number of trainable parameters.

[1]  Meng Zhang,et al.  Neural Network Methods for Natural Language Processing , 2017, Computational Linguistics.

[2]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[3]  Ribana Roscher,et al.  Explainable Machine Learning for Scientific Insights and Discoveries , 2019, IEEE Access.

[4]  Ali Ramadhan,et al.  Universal Differential Equations for Scientific Machine Learning , 2020, ArXiv.

[5]  Lars Ruthotto,et al.  Layer-Parallel Training of Deep Residual Neural Networks , 2018, SIAM J. Math. Data Sci..

[6]  Alena Kopanicáková,et al.  Multilevel Minimization for Deep Residual Networks , 2020, ESAIM: Proceedings and Surveys.

[7]  Carola-Bibiane Schönlieb,et al.  Deep learning as optimal control problems: models and numerical methods , 2019, Journal of Computational Dynamics.

[8]  Kurt Keutzer,et al.  ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs , 2019, IJCAI.

[9]  Michael W. Mahoney,et al.  Continuous-in-Depth Neural Networks , 2020, ArXiv.

[10]  Michael Innes,et al.  Don't Unroll Adjoint: Differentiating SSA-Form Programs , 2018, ArXiv.

[11]  M. F. Baumgardner,et al.  220 Band AVIRIS Hyperspectral Image Data Set: June 12, 1992 Indian Pine Test Site 3 , 2015 .

[12]  E Weinan,et al.  A Proposal on Machine Learning via Dynamical Systems , 2017, Communications in Mathematics and Statistics.

[13]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[14]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[15]  Maximilian Bayer,et al.  Numerical Analysis Mathematics Of Scientific Computing , 2016 .

[16]  Paris Perdikaris,et al.  Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , 2019, J. Comput. Phys..

[17]  Arnulf Jentzen,et al.  Space-time deep neural network approximations for high-dimensional partial differential equations , 2020, ArXiv.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andreas Griewank,et al.  Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation , 2000, TOMS.

[20]  Hajime Asama,et al.  Dissecting Neural ODEs , 2020, NeurIPS.

[21]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.