Better Approximations of High Dimensional Smooth Functions by Deep Neural Networks with Rectified Power Units

Deep neural networks with rectified linear units (ReLU) are getting more and more popular due to their universal representation power and successful applications. Some theoretical progress regarding the approximation power of deep ReLU network for functions in Sobolev space and Korobov space have recently been made by [D. Yarotsky, Neural Network, 94:103-114, 2017] and [H. Montanelli and Q. Du, SIAM J Math. Data Sci., 1:78-92, 2019], etc. In this paper, we show that deep networks with rectified power units (RePU) can give better approximations for smooth functions than deep ReLU networks. Our analysis bases on classical polynomial approximation theory and some efficient algorithms proposed in this paper to convert polynomials into deep RePU networks of optimal size with no approximation error. Comparing to the results on ReLU networks, the sizes of RePU networks required to approximate functions in Sobolev space and Korobov space with an error tolerance $\varepsilon$, by our constructive proofs, are in general $\mathcal{O}(\log\frac{1}{\varepsilon})$ times smaller than the sizes of corresponding ReLU networks constructed in most of the existing literature. Comparing to the classical results of Mhaskar [Mhaskar, Adv. Comput. Math. 1:61-80, 1993], our constructions use less number of activation functions and numerically more stable, they can be served as good initials of deep RePU networks and further trained to break the limit of linear approximation theory. The functions represented by RePU networks are smooth functions, so they naturally fit in the places where derivatives are involved in the loss function.

[1]  Jie Shen,et al.  Efficient Spectral-Element Methods for the Electronic Schrödinger Equation , 2016 .

[2]  M. Griebel,et al.  Sparse grids for the Schrödinger equation , 2007 .

[3]  R. Srikant,et al.  Why Deep Neural Networks for Function Approximation? , 2016, ICLR.

[4]  Christoph Schwab,et al.  Deep ReLU networks and high-order finite element methods , 2020, Analysis and Applications.

[5]  Henryk Wozniakowski,et al.  When Are Quasi-Monte Carlo Algorithms Efficient for High Dimensional Integrals? , 1998, J. Complex..

[6]  Fabio Nobile,et al.  A Sparse Grid Stochastic Collocation Method for Partial Differential Equations with Random Input Data , 2008, SIAM J. Numer. Anal..

[7]  E Weinan,et al.  The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems , 2017, Communications in Mathematics and Statistics.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  H. N. Mhaskar,et al.  Neural Networks for Optimal Approximation of Smooth and Analytic Functions , 1996, Neural Computation.

[10]  Philipp Petersen,et al.  Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.

[11]  Pingwen Zhang,et al.  A Nonhomogeneous Kinetic Model of Liquid Crystal Polymers and Its Thermodynamic Closure Approximation , 2009 .

[12]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[13]  Heng Xiao,et al.  Data-Driven, Physics-Based Feature Extraction from Fluid Flow Fields using Convolutional Neural Networks , 2018, Communications in Computational Physics.

[14]  P. Petrushev Approximation by ridge functions and neural networks , 1999 .

[15]  R. DeVore,et al.  Optimal nonlinear approximation , 1989 .

[16]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[17]  G. Palm Warren McCulloch and Walter Pitts: A Logical Calculus of the Ideas Immanent in Nervous Activity , 1986 .

[18]  Dmitry Yarotsky,et al.  Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.

[19]  Raul Tempone,et al.  An Adaptive Sparse Grid Algorithm for Elliptic PDEs with Lognormal Diffusion Coefficient , 2016 .

[20]  John Morrissey,et al.  Data driven. , 2019, Hospitals & health networks.

[21]  Philipp Petersen,et al.  Approximation in $L^p(\mu)$ with deep ReLU neural networks. , 2019 .

[22]  Harry Yserentant,et al.  The hyperbolic cross space approximation of electronic wavefunctions , 2007, Numerische Mathematik.

[23]  Qiang Du,et al.  New Error Bounds for Deep ReLU Networks Using Sparse Grids , 2017, SIAM J. Math. Data Sci..

[24]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[25]  Wei Guo,et al.  Sparse grid discontinuous Galerkin methods for high-dimensional elliptic equations , 2015, J. Comput. Phys..

[26]  Fabio Nobile,et al.  Error Analysis of the Dynamically Orthogonal Approximation of Time Dependent Random PDEs , 2015, SIAM J. Sci. Comput..

[27]  Ian H. Sloan,et al.  Why Are High-Dimensional Finance Problems Often of Low Effective Dimension? , 2005, SIAM J. Sci. Comput..

[28]  Juncai He sci Relu Deep Neural Networks and Linear Finite Elements , 2020 .

[29]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[30]  Aihui Zhou,et al.  A sparse finite element method with high accuracy Part I , 2001, Numerische Mathematik.

[31]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[32]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[33]  Hrushikesh Narhar Mhaskar,et al.  Approximation properties of a multilayered feedforward artificial neural network , 1993, Adv. Comput. Math..

[34]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[35]  Bo Li,et al.  PowerNet: Efficient Representations of Polynomials and Smooth Functions by Deep Neural Networks with Rectified Power Units , 2019, ArXiv.

[36]  Jie Shen,et al.  A NODAL SPARSE GRID SPECTRAL ELEMENT METHOD FOR MULTI-DIMENSIONAL ELLIPTIC PARTIAL DIFFERENTIAL EQUATIONS , 2017 .

[37]  Charles K. Chui,et al.  Deep Nets for Local Manifold Learning , 2016, Front. Appl. Math. Stat..

[38]  Jie Shen,et al.  Sparse Spectral Approximations of High-Dimensional Problems Based on Hyperbolic Cross , 2010, SIAM J. Numer. Anal..

[39]  E Weinan,et al.  Exponential convergence of the deep neural network approximation for analytic functions , 2018, Science China Mathematics.

[40]  Matus Telgarsky,et al.  Representation Benefits of Deep Feedforward Networks , 2015, ArXiv.

[41]  G. Mellor,et al.  Development of a turbulence closure model for geophysical fluid problems , 1982 .

[42]  Thomas Gerstner,et al.  Numerical integration using sparse grids , 2004, Numerical Algorithms.

[43]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[44]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[45]  Christoph Schwab,et al.  Sparse Finite Elements for Stochastic Elliptic Problems – Higher Order Moments , 2003, Computing.

[46]  E Weinan,et al.  Deep Potential Molecular Dynamics: a scalable model with the accuracy of quantum mechanics , 2017, Physical review letters.

[47]  T. Carrington,et al.  Solving the Schroedinger equation using Smolyak interpolants. , 2013, The Journal of chemical physics.

[48]  E. Weinan,et al.  Deep Potential: a general representation of a many-body potential energy surface , 2017, 1707.01478.

[49]  H. Bungartz,et al.  Sparse grids , 2004, Acta Numerica.

[50]  Walter Gautschi,et al.  Optimally scaled and optimally conditioned Vandermonde and Vandermonde-like matrices , 2011 .

[51]  E Weinan,et al.  Model Reduction with Memory and the Machine Learning of Dynamical Systems , 2018, Communications in Computational Physics.

[52]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[53]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[54]  Jie Shen,et al.  Spectral Methods: Algorithms, Analysis and Applications , 2011 .

[55]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[56]  Jie Shen,et al.  Approximations by orthonormal mapped Chebyshev functions for higher-dimensional problems in unbounded domains , 2014, J. Comput. Appl. Math..

[57]  Haijun Yu,et al.  A dynamic-solver-consistent minimum action method: With an application to 2D Navier-Stokes equations , 2017, J. Comput. Phys..

[58]  Jinchao Xu,et al.  Relu Deep Neural Networks and Linear Finite Elements , 2018, Journal of Computational Mathematics.

[59]  Kai Staats,et al.  Finding the Origin of Noise Transients in LIGO Data with Machine Learning , 2018, Communications in Computational Physics.

[60]  Hans-joachim Bungartz,et al.  An adaptive poisson solver using hierarchical bases and sparse grids , 1991, Forschungsberichte, TU Munich.

[61]  Vladimir N. Temlyakov,et al.  Hyperbolic Cross Approximation , 2016, 1601.03978.

[62]  Jie Shen,et al.  Efficient Spectral Sparse Grid Methods and Applications to High-Dimensional Elliptic Equations II. Unbounded Domains , 2012, SIAM J. Sci. Comput..

[63]  Arnulf Jentzen,et al.  Solving high-dimensional partial differential equations using deep learning , 2017, Proceedings of the National Academy of Sciences.

[64]  Erich Novak,et al.  High dimensional polynomial interpolation on sparse grids , 2000, Adv. Comput. Math..

[65]  H. Mhaskar,et al.  Neural networks for localized approximation , 1994 .

[66]  Jie Shen,et al.  Efficient Spectral Sparse Grid Methods and Applications to High-Dimensional Elliptic Problems , 2010, SIAM J. Sci. Comput..