The Random Feature Model for Input-Output Maps between Banach Spaces

Well known to the machine learning community, the random feature model, originally introduced by Rahimi and Recht in 2008, is a parametric approximation to kernel interpolation or regression methods. It is typically used to approximate functions mapping a finite-dimensional input space to the real line. In this paper, we instead propose a methodology for use of the random feature model as a data-driven surrogate for operators that map an input Banach space to an output Banach space. Although the methodology is quite general, we consider operators defined by partial differential equations (PDEs); here, the inputs and outputs are themselves functions, with the input parameters being functions required to specify the problem, such as initial data or coefficients, and the outputs being solutions of the problem. Upon discretization, the model inherits several desirable attributes from this infinite-dimensional, function space viewpoint, including mesh-invariant approximation error with respect to the true PDE solution map and the capability to be trained at one mesh resolution and then deployed at different mesh resolutions. We view the random feature model as a non-intrusive data-driven emulator, provide a mathematical framework for its interpretation, and demonstrate its ability to efficiently and accurately approximate the nonlinear parameter-to-solution maps of two prototypical PDEs arising in physical science and engineering applications: viscous Burgers' equation and a variable coefficient elliptic equation.

[1]  R. Ghanem,et al.  Stochastic Finite Element Expansion for Random Media , 1989 .

[2]  Ambuj Tewari,et al.  On the Approximation Properties of Random ReLU Features , 2018 .

[3]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[4]  Radford M. Neal Priors for Infinite Networks , 1996 .

[5]  W. E A Proposal on Machine Learning via Dynamical Systems , 2017 .

[6]  R. Shterenberg,et al.  Blow up and regularity for fractal Burgers equation , 2008, 0804.3549.

[7]  Ronald DeVore,et al.  The Theoretical Foundation of Reduced Basis Methods , 2014 .

[8]  Martin J. Mohlenkamp,et al.  Algorithms for Numerical Analysis in High Dimensions , 2005, SIAM J. Sci. Comput..

[9]  E Weinan,et al.  The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems , 2017, Communications in Mathematics and Statistics.

[10]  Ioannis G. Kevrekidis,et al.  DISCRETE- vs. CONTINUOUS-TIME NONLINEAR SIGNAL PROCESSING OF Cu ELECTRODISSOLUTION DATA , 1992 .

[11]  Tim Colonius,et al.  FiniteNet: A Fully Convolutional LSTM Network Architecture for Time-Dependent Partial Differential Equations , 2020, ArXiv.

[12]  A. Cohen,et al.  Optimal weighted least-squares methods , 2016, 1608.00512.

[13]  William Thielicke FFT? , 2020 .

[14]  D. Gilbarg,et al.  Elliptic Partial Differential Equa-tions of Second Order , 1977 .

[15]  Yury Korolev Two-layer neural networks with values in a Banach space , 2021, ArXiv.

[16]  Nikola B. Kovachki,et al.  Fourier Neural Operator for Parametric Partial Differential Equations , 2020, ICLR.

[17]  Zhiwen Zhang,et al.  A Data-Driven Stochastic Method for Elliptic PDEs with Random Coefficients , 2013, SIAM/ASA J. Uncertain. Quantification.

[18]  Bengt Fornberg,et al.  A practical guide to pseudospectral methods: Introduction , 1996 .

[19]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[20]  A. Rahimi,et al.  Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[21]  W. D. Evans,et al.  PARTIAL DIFFERENTIAL EQUATIONS , 1941 .

[22]  Lexing Ying,et al.  Meta-learning Pseudo-differential Operators with Deep Neural Networks , 2019, J. Comput. Phys..

[23]  Holger Wendland,et al.  Scattered Data Approximation: Conditionally positive definite functions , 2004 .

[24]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[25]  Anthony Nouy,et al.  Low-rank methods for high-dimensional approximation and model order reduction , 2015, 1511.01554.

[26]  Yuan Cao,et al.  Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.

[27]  Lexing Ying,et al.  Solving parametric PDE problems with artificial neural networks , 2017, European Journal of Applied Mathematics.

[28]  Paul J. Atzberger,et al.  GMLS-Nets: A framework for learning from unstructured data , 2019, ArXiv.

[29]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2001, Springer Series in Statistics.

[30]  J. Hesthaven,et al.  Non-intrusive reduced order modeling of nonlinear problems using neural networks , 2018, J. Comput. Phys..

[31]  Gitta Kutyniok,et al.  A Theoretical Analysis of Deep Neural Networks and Parametric PDEs , 2019, Constructive Approximation.

[32]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[33]  Benjamin Peherstorfer,et al.  Survey of multifidelity methods in uncertainty propagation, inference, and optimization , 2018, SIAM Rev..

[34]  Christopher K. I. Williams Computing with Infinite Networks , 1996, NIPS.

[35]  Stéphane Canu,et al.  Operator-valued Kernels for Learning from Functional Response Data , 2015, J. Mach. Learn. Res..

[36]  Olivier Desjardins,et al.  Nonlinear integro-differential operator regression with neural networks , 2018, ArXiv.

[37]  Ravi G. Patel,et al.  A physics-informed operator regression framework for extracting data-driven continuum models , 2020, ArXiv.

[38]  Ben Adcock,et al.  Deep Neural Networks Are Effective At Learning High-Dimensional Hilbert-Valued Functions From Limited Data , 2020, MSML.

[39]  Jian-Xun Wang,et al.  Non-intrusive model reduction of large-scale, nonlinear dynamical systems using deep learning , 2019, Physica D: Nonlinear Phenomena.

[40]  Y. Marzouk,et al.  Data-driven forward discretizations for Bayesian inversion , 2020, Inverse Problems.

[41]  Gianluca Iaccarino,et al.  A least-squares approximation of partial differential equations with high-dimensional random inputs , 2009, J. Comput. Phys..

[42]  Jacob Bear,et al.  Fundamentals of transport phenomena in porous media , 1984 .

[43]  Yingzhou Li,et al.  Variational training of neural network approximations of solution maps for physical models , 2019, J. Comput. Phys..

[44]  Jan S. Hesthaven,et al.  Non-intrusive reduced order modeling of nonlinear problems using neural networks , 2018, J. Comput. Phys..

[45]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[46]  G. Roberts,et al.  MCMC Methods for Functions: ModifyingOld Algorithms to Make Them Faster , 2012, 1202.0709.

[47]  Stefan Ulbrich,et al.  Optimization with PDE Constraints , 2008, Mathematical modelling.

[48]  Christoph Schwab,et al.  Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in UQ , 2018, Analysis and Applications.

[49]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[50]  Peng Chen,et al.  Derivative-Informed Projected Neural Networks for High-Dimensional Parametric Maps Governed by PDEs , 2020, Computer Methods in Applied Mechanics and Engineering.

[51]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[52]  Hari Sundar,et al.  FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube , 2014, SIAM J. Sci. Comput..

[53]  Bin Dong,et al.  PDE-Net: Learning PDEs from Data , 2017, ICML.

[54]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[55]  E Weinan,et al.  A mean-field optimal control formulation of deep learning , 2018, Research in the Mathematical Sciences.

[56]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[57]  Nikola B. Kovachki,et al.  Model Reduction and Neural Networks for Parametric PDEs , 2020, The SMAI journal of computational mathematics.

[58]  Lexing Ying,et al.  Solving Electrical Impedance Tomography with Deep Learning , 2019, J. Comput. Phys..

[59]  Albert Cohen,et al.  Approximation of high-dimensional parametric PDEs * , 2015, Acta Numerica.

[60]  C. Carmeli,et al.  VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES OF INTEGRABLE FUNCTIONS AND MERCER THEOREM , 2006 .

[61]  Gitta Kutyniok,et al.  Numerical Solution of the Parametric Diffusion Equation by Deep Neural Networks , 2020, Journal of Scientific Computing.

[62]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[63]  Albert Cohen,et al.  Sparse adaptive Taylor approximation algorithms for parametric and stochastic elliptic PDEs , 2011 .

[64]  Stephan Hoyer,et al.  Learning data-driven discretizations for partial differential equations , 2018, Proceedings of the National Academy of Sciences.

[65]  Ioannis G. Kevrekidis,et al.  Identification of distributed parameter systems: A neural net based approach , 1998 .

[66]  George Em Karniadakis,et al.  DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators , 2019, ArXiv.

[67]  N. Nguyen,et al.  An ‘empirical interpolation’ method: application to efficient reduced-basis discretization of partial differential equations , 2004 .

[68]  Nicholas Zabaras,et al.  Bayesian Deep Convolutional Encoder-Decoder Networks for Surrogate Modeling and Uncertainty Quantification , 2018, J. Comput. Phys..

[69]  E Weinan,et al.  On the Generalization Properties of Minimum-norm Solutions for Over-parameterized Neural Network Models , 2019, ArXiv.

[70]  A. Stuart,et al.  The Bayesian Approach to Inverse Problems , 2013, 1302.6989.

[71]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[72]  F. Lutscher Spatial Variation , 2019, Interdisciplinary Applied Mathematics.

[73]  Holger Wendland,et al.  Kernel-Based Reconstructions for Parametric PDEs , 2017, Meshfree Methods for Partial Differential Equations IX.

[74]  Michael Griebel,et al.  Reproducing Kernel Hilbert Spaces for Parametric Partial Differential Equations , 2017, SIAM/ASA J. Uncertain. Quantification.

[75]  Karthik Ramani,et al.  ConvPDE-UQ: Convolutional neural networks with quantified uncertainty for heterogeneous elliptic partial differential equations on varied domains , 2019, J. Comput. Phys..

[76]  Kookjin Lee,et al.  Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders , 2018, J. Comput. Phys..

[77]  Lei Wu,et al.  Machine learning from a continuous viewpoint, I , 2019, Science China Mathematics.

[78]  D. Xiu,et al.  Data-Driven Deep Learning of Partial Differential Equations in Modal Space , 2019, J. Comput. Phys..

[79]  Kamyar Azizzadenesheli,et al.  Neural Operator: Graph Kernel Network for Partial Differential Equations , 2020, ICLR 2020.

[80]  Hong Chen,et al.  Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems , 1995, IEEE Trans. Neural Networks.

[81]  RakotomamonjyAlain,et al.  Operator-valued kernels for learning from functional response data , 2016 .

[82]  Paris Perdikaris,et al.  Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , 2019, J. Comput. Phys..

[83]  Krzysztof J. Fidkowski,et al.  Output-Based Error Estimation and Mesh Adaptation Using Convolutional Neural Networks: Application to a Scalar Advection-Diffusion Problem , 2020 .

[84]  Justin A. Sirignano,et al.  DGM: A deep learning algorithm for solving partial differential equations , 2017, J. Comput. Phys..

[85]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[86]  Eldad Haber,et al.  Deep Neural Networks Motivated by Partial Differential Equations , 2018, Journal of Mathematical Imaging and Vision.

[87]  Rüdiger Verfürth,et al.  Adaptive finite element methods for elliptic equations with non-smooth coefficients , 2000, Numerische Mathematik.

[88]  Ilias Bilionis,et al.  Deep UQ: Learning deep neural network surrogate models for high dimensional uncertainty quantification , 2018, J. Comput. Phys..

[89]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[90]  Marco A. Iglesias,et al.  Hierarchical Bayesian level set inversion , 2016, Statistics and Computing.

[91]  A. Cohen,et al.  Model Reduction and Approximation: Theory and Algorithms , 2017 .

[92]  E Weinan,et al.  A Proposal on Machine Learning via Dynamical Systems , 2017, Communications in Mathematics and Statistics.

[93]  Markus Heinonen,et al.  Random Fourier Features For Operator-Valued Kernels , 2016, ACML.

[94]  C. Schwab,et al.  Deep learning in high dimension: ReLU network Expression Rates for Bayesian PDE inversion , 2020 .

[95]  Lloyd N. Trefethen,et al.  Fourth-Order Time-Stepping for Stiff PDEs , 2005, SIAM J. Sci. Comput..

[96]  Fabrice Rossi,et al.  Functional multi-layer perceptron: a non-linear tool for functional data analysis , 2007, Neural Networks.

[97]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[98]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[99]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[100]  Simone Deparis,et al.  Data driven approximation of parametrized PDEs by Reduced Basis and Neural Networks , 2019, J. Comput. Phys..