Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks

We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizing it using a variant of stochastic gradient descent algorithm. In order to further increase the training speed, an activation slope-based slope recovery term is added in the loss function, which further accelerates convergence, thereby reducing the training cost. On the theoretical side, we prove that in the proposed method, the gradient descent algorithms are not attracted to sub-optimal critical points or local minima under practical conditions on the initialization and learning rate, and that the gradient dynamics of the proposed method is not achievable by base methods with any (adaptive) learning rates. We further show that the adaptive activation methods accelerate the convergence by implicitly multiplying conditioning matrices to the gradient of the base method without any explicit computation of the conditioning matrix and the matrix–vector product. The different adaptive activation functions are shown to induce different implicit conditioning matrices. Furthermore, the proposed methods with the slope recovery are shown to accelerate the training process.

[1]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[2]  Frederic Gibou,et al.  Solving inverse-PDE problems with physics-aware neural networks , 2020, ArXiv.

[3]  Bin-Da Liu,et al.  An adaptive activation function for multilayer feedforward neural networks , 2002, 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM '02. Proceedings..

[4]  Paris Perdikaris,et al.  Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , 2019, J. Comput. Phys..

[5]  Hau-San Wong,et al.  Adaptive activation functions in convolutional neural networks , 2018, Neurocomputing.

[6]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[7]  Bin Li,et al.  The extreme learning machine learning algorithm with tunable activation function , 2012, Neural Computing and Applications.

[8]  J. Burgers A mathematical model illustrating the theory of turbulence , 1948 .

[9]  Yanjun Shen,et al.  A New Multi-output Neural Model with Tunable Activation Function and its Applications , 2004, Neural Processing Letters.

[10]  Raymond W. Ptucha,et al.  Adaptive Activation Functions for Deep Networks , 2016, Computational Imaging.

[11]  George Em Karniadakis,et al.  Adaptive activation functions accelerate convergence in deep and physics-informed neural networks , 2019, J. Comput. Phys..

[12]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[13]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[14]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[15]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16]  G. Karniadakis,et al.  Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems , 2020 .

[17]  Vladimír Kunc,et al.  On transformative adaptive activation functions in neural networks for gene expression inference , 2019, bioRxiv.

[18]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[19]  Paris Perdikaris,et al.  Understanding and mitigating gradient pathologies in physics-informed neural networks , 2020, ArXiv.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[23]  G. Whitham,et al.  Linear and Nonlinear Waves , 1976 .

[24]  Alex Lamb,et al.  Deep Learning for Classical Japanese Literature , 2018, ArXiv.

[25]  Ameya Dilip Jagtap Method of Relaxed Streamlined-Upwinding for Hyperbolic Conservation Laws , 2016, 1611.03338.

[26]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[27]  On transformative adaptive activation functions in neural networks for gene expression inference , 2021, PloS one.

[28]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[29]  C. Basdevant,et al.  Spectral and finite difference solutions of the Burgers equation , 1986 .

[30]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[33]  H. Bateman,et al.  SOME RECENT RESEARCHES ON THE MOTION OF FLUIDS , 1915 .