Relu Deep Neural Networks and Linear Finite Elements

In this paper, we investigate the relationship between deep neural networks (DNN) with rectified linear unit (ReLU) function as the activation function and continuous piecewise linear (CPWL) functions, especially CPWL functions from the simplicial linear finite element method (FEM). We first consider the special case of FEM. By exploring the DNN representation of its nodal basis functions, we present a ReLU DNN representation of CPWL in FEM. We theoretically establish that at least 2 hidden layers are needed in a ReLU DNN to represent any linear finite element functions in Ω ⊆ R when d ≥ 2. Consequently, for d = 2, 3 which are often encountered in scientific and engineering computing, the minimal number of two hidden layers are necessary and sufficient for any CPWL function to be represented by a ReLU DNN. Then we include a detailed account on how a general CPWL in R can be represented by a ReLU DNN with at most ⌈log2(d+1)⌉ hidden layers and we also give an estimation of the number of neurons in DNN that are needed in such a representation. Furthermore, using the relationship between DNN and FEM, we theoretically argue that a special class of DNN models with low bit-width are still expected to have an adequate representation power in applications. Finally, as a proof of concept, we present some numerical results for using ReLU DNNs to solve a two point boundary problem to demonstrate the potential of applying DNN for numerical solution of partial differential equations. Mathematics subject classification: 26B40, 65N30, 65N99.

[1]  Nadav Cohen,et al.  On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[2]  Ding-Xuan Zhou,et al.  Universality of Deep Convolutional Neural Networks , 2018, Applied and Computational Harmonic Analysis.

[3]  Guirong Liu Mesh Free Methods: Moving Beyond the Finite Element Method , 2002 .

[4]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[5]  Raman Arora,et al.  Understanding Deep Neural Networks with Rectified Linear Units , 2016, Electron. Colloquium Comput. Complex..

[6]  Philippe G. Ciarlet,et al.  The finite element method for elliptic problems , 2002, Classics in applied mathematics.

[7]  Shuning Wang,et al.  Generalization of hinging hyperplanes , 2005, IEEE Transactions on Information Theory.

[8]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Jack Xin,et al.  BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights , 2018, SIAM J. Imaging Sci..

[10]  L. R. Scott,et al.  The Mathematical Theory of Finite Element Methods , 1994 .

[11]  G. Yagawa,et al.  Free mesh method: A new meshless finite element method , 1996 .

[12]  Paris Perdikaris,et al.  Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations , 2017, ArXiv.

[13]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[14]  Mona E. Zaghloul,et al.  Analog cellular neural network with application to partial differential equations with variable mesh-size , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[15]  E Weinan,et al.  The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems , 2017, Communications in Mathematics and Statistics.

[16]  Andrew J. Meade,et al.  Solution of nonlinear ordinary differential equations by feedforward neural networks , 1994 .

[17]  J. M. Tarela,et al.  Region configurations for realizability of lattice Piecewise-Linear models , 1999 .

[18]  Andrew R. Barron,et al.  Approximation by Combinations of ReLU and Squared ReLU Ridge Functions With $\ell^1$ and $\ell^0$ Controls , 2016, IEEE Transactions on Information Theory.

[19]  Paris Perdikaris,et al.  Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations , 2017, ArXiv.

[20]  Pingwen Zhang,et al.  Moving mesh methods in multiple dimensions based on harmonic maps , 2001 .

[21]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[22]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[23]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[24]  Pingwen Zhang,et al.  A Moving Mesh Finite Element Algorithm for Singular Problems in Two and Three Space Dimensions , 2002 .

[25]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[26]  Alexander Cloninger,et al.  Provable approximation properties for deep neural networks , 2015, ArXiv.

[27]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[28]  Eugenio Oñate,et al.  The meshless finite element method , 2003 .

[29]  E Weinan,et al.  Overcoming the curse of dimensionality: Solving high-dimensional partial differential equations using deep learning , 2017, ArXiv.

[30]  E Weinan,et al.  Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations , 2017, Communications in Mathematics and Statistics.

[31]  Lexing Ying,et al.  Solving parametric PDE problems with artificial neural networks , 2017, European Journal of Applied Mathematics.

[32]  Dimitrios I. Fotiadis,et al.  Artificial neural networks for solving ordinary and partial differential equations , 1997, IEEE Trans. Neural Networks.

[33]  S. W. Ellacott,et al.  Aspects of the numerical analysis of neural networks , 1994, Acta Numerica.

[34]  Marcello Sanguineti,et al.  Comparison of worst case errors in linear and neural network approximation , 2002, IEEE Trans. Inf. Theory.

[35]  Hrushikesh Narhar Mhaskar,et al.  On the tractability of multivariate integration and approximation by neural networks , 2004, J. Complex..

[36]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[37]  Andrew J. Meade,et al.  The numerical solution of linear ordinary differential equations by feedforward neural networks , 1994 .

[38]  Kurt Hornik,et al.  Degree of Approximation Results for Feedforward Networks Approximating Unknown Mappings and Their Derivatives , 1994, Neural Computation.

[39]  Ricardo H. Nochetto,et al.  Multiscale and Adaptivity: Modeling, Numerics and Applications , 2012 .

[40]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[41]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[42]  Ricardo H. Nochetto,et al.  Primer of Adaptive Finite Element Methods , 2011 .

[43]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[44]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .