Fine-tuning Neural-Operator architectures for training and generalization

This work provides a comprehensive analysis of the generalization properties of Neural Operators (NOs) and their derived architectures. Through empirical evaluation of the test loss, analysis of the complexity-based generalization bounds, and qualitative assessments of the visualization of the loss landscape, we investigate modifications aimed at enhancing the generalization capabilities of NOs. Inspired by the success of Transformers, we propose ${\textit{s}}{\text{NO}}+\varepsilon$, which introduces a kernel integral operator in lieu of self-Attention. Our results reveal significantly improved performance across datasets and initializations, accompanied by qualitative changes in the visualization of the loss landscape. We conjecture that the layout of Transformers enables the optimization algorithm to find better minima, and stochastic depth, improve the generalization performance. As a rigorous analysis of training dynamics is one of the most prominent unsolved problems in deep learning, our exclusive focus is on the analysis of the complexity-based generalization of the architectures. Building on statistical theory, and in particular Dudley theorem, we derive upper bounds on the Rademacher complexity of NOs, and ${\textit{s}}{\text{NO}}+\varepsilon$. For the latter, our bounds do not rely on norm control of parameters. This makes it applicable to networks of any depth, as long as the random variables in the architecture follow a decay law, which connects stochastic depth with generalization, as we have conjectured. In contrast, the bounds in NOs, solely rely on norm control of the parameters, and exhibit an exponential dependence on depth. Furthermore, our experiments also demonstrate that our proposed network exhibits remarkable generalization capabilities when subjected to perturbations in the data distribution. In contrast, NO perform poorly in out-of-distribution scenarios.

[1]  R. Molinaro,et al.  Neural Inverse Operators for Solving PDE Inverse Problems , 2023, ArXiv.

[2]  K. Azizzadenesheli,et al.  Accelerating Carbon Capture and Storage Modeling using Fourier Neural Operators , 2022, ArXiv.

[3]  Jianchun Wang,et al.  Fourier neural operator approach to large eddy simulation of three-dimensional turbulence , 2022, Theoretical and Applied Mechanics Letters.

[4]  Youzuo Lin,et al.  Solving Seismic Wave Equations on Variable Velocity Models With Fourier Neural Operator , 2022, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Myung-joo Kang,et al.  Bounding the Rademacher Complexity of Fourier Neural Operators , 2022, Mach. Learn..

[6]  Jayesh K. Gupta,et al.  Clifford Neural Layers for PDE Modeling , 2022, ICLR.

[7]  Anima Anandkumar,et al.  FourCastNet: Accelerating Global High-Resolution Weather Forecasting Using Adaptive Fourier Neural Operators , 2022, PASC.

[8]  Tim De Ryck,et al.  Generic bounds on the approximation error for physics-informed (and) operator learning , 2022, NeurIPS.

[9]  Anirbit Mukherjee,et al.  Capacity Bounds for the DeepONet Method of Solving Differential Equations , 2022, ArXiv.

[10]  S. Chakraborty,et al.  Wavelet neural operator: a neural operator for parametric partial differential equations , 2022, ArXiv.

[11]  F. Herrmann,et al.  Learned coupled inversion for carbon sequestration monitoring and forecasting with Fourier neural operators , 2022, Second International Meeting for Applied Geoscience & Energy.

[12]  Colton J. Ross,et al.  Learning Deep Implicit Fourier Neural Operators (IFNOs) with Applications to Heterogeneous Material Modeling , 2022, Computer Methods in Applied Mechanics and Engineering.

[13]  K. Azizzadenesheli,et al.  FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators , 2022, ArXiv.

[14]  Trevor Darrell,et al.  A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  George J. Pappas,et al.  Learning Operators with Coupled Attention , 2022, J. Mach. Learn. Res..

[16]  Shuicheng Yan,et al.  MetaFormer is Actually What You Need for Vision , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  G. Karniadakis,et al.  A comprehensive and fair comparison of two neural operators (with practical extensions) based on FAIR data , 2021, Computer Methods in Applied Mechanics and Engineering.

[18]  K. Azizzadenesheli,et al.  U-FNO - an enhanced Fourier neural operator based-deep learning model for multiphase flow , 2021, Advances in Water Resources.

[19]  Joshua Ainslie,et al.  FNet: Mixing Tokens with Fourier Transforms , 2021, NAACL.

[20]  George Em Karniadakis,et al.  Error estimates for DeepOnets: A deep learning framework in infinite dimensions , 2021, Transactions of Mathematics and Its Applications.

[21]  Fahad Shahbaz Khan,et al.  Transformers in Vision: A Survey , 2021, ACM Comput. Surv..

[22]  R. Nickl,et al.  On polynomial-time computation of high-dimensional posterior measures by Langevin-type algorithms , 2020, Journal of the European Mathematical Society.

[23]  Philipp A. Witte,et al.  Towards Large-Scale Learned Solvers for Parametric PDEs with Model-Parallel Fourier Neural Operators , 2022, ArXiv.

[24]  André F. T. Martins,et al.  ∞-former: Infinite Memory Transformer , 2022, ACL.

[25]  Bryan Catanzaro,et al.  Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers , 2021, ArXiv.

[26]  Parag V. Chitnis,et al.  Fourier Neural Operator Networks: A Fast and General Solver for the Photoacoustic Wave Equation , 2021, ArXiv.

[27]  Nikola B. Kovachki,et al.  Neural Operator: Learning Maps Between Function Spaces , 2021, ArXiv.

[28]  Kamyar Azizzadenesheli,et al.  Seismic wave propagation and inversion with Neural Operators , 2021, The Seismic Record.

[29]  Siddhartha Mishra,et al.  On universal approximation and error bounds for Fourier Neural Operators , 2021, J. Mach. Learn. Res..

[30]  Jiwen Lu,et al.  Global Filter Networks for Image Classification , 2021, NeurIPS.

[31]  P. A. Martin,et al.  Time-Domain Scattering , 2021 .

[32]  Shuhao Cao Choose a Transformer: Fourier or Galerkin , 2021, NeurIPS.

[33]  A. Dosovitskiy,et al.  MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.

[34]  Andrea Montanari,et al.  Deep learning: a statistical viewpoint , 2021, Acta Numerica.

[35]  Florian Faucher,et al.  hawen: time-harmonic wave modeling and inversion using hybridizable discontinuous Galerkin discretization , 2021, J. Open Source Softw..

[36]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[37]  Nikola B. Kovachki,et al.  Fourier Neural Operator for Parametric Partial Differential Equations , 2020, ICLR.

[38]  Nikola B. Kovachki,et al.  Model Reduction and Neural Networks for Parametric PDEs , 2020, The SMAI journal of computational mathematics.

[39]  George Em Karniadakis,et al.  Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators , 2019, Nature Machine Intelligence.

[40]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Dawn Song,et al.  Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.

[42]  Kamyar Azizzadenesheli,et al.  Neural Operator: Graph Kernel Network for Partial Differential Equations , 2020, ICLR 2020.

[43]  O. Scherzer,et al.  Adjoint-state method for Hybridizable Discontinuous Galerkin discretization, application to the inverse acoustic wave problem , 2020, Computer Methods in Applied Mechanics and Engineering.

[44]  Maarten V. de Hoop,et al.  Full reciprocity-gap waveform inversion enabling sparse-source acquisition , 2019, 1907.09163.

[45]  Han Zhao,et al.  On Learning Invariant Representations for Domain Adaptation , 2019, ICML.

[46]  Paris Perdikaris,et al.  Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , 2019, J. Comput. Phys..

[47]  Miguel R. D. Rodrigues,et al.  Generalization Error in Deep Learning , 2018, Applied and Numerical Harmonic Analysis.

[48]  Martin J. Gander,et al.  A Class of Iterative Solvers for the Helmholtz Equation: Factorizations, Sweeping Preconditioners, Source Transfer, Single Layer Potentials, Polarized Traces, and Optimized Schwarz Methods , 2016, SIAM Rev..

[49]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[50]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[52]  Aad van der Vaart,et al.  Fundamentals of Nonparametric Bayesian Inference , 2017 .

[53]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[54]  A. Stuart,et al.  The Bayesian Approach to Inverse Problems , 2013, 1302.6989.

[55]  J. Hesthaven,et al.  Non-intrusive reduced order modeling of nonlinear problems using neural networks , 2018, J. Comput. Phys..

[56]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[57]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[58]  Lee-Ad Gottlieb,et al.  Adaptive metric dimensionality reduction , 2013, Theor. Comput. Sci..

[59]  R. Nickl,et al.  Mathematical Foundations of Infinite-Dimensional Statistical Models , 2015 .

[60]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[61]  Zhi-Hua Zhou,et al.  Dropout Rademacher complexity of deep neural networks , 2014, Science China Information Sciences.

[62]  Catherine E. Powell,et al.  An Introduction to Computational Stochastic PDEs , 2014 .

[63]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[64]  Martin J. Gander,et al.  Why it is Difficult to Solve Helmholtz Problems with Classical Iterative Methods , 2012 .

[65]  Peter Monk,et al.  Error Analysis for a Hybridizable Discontinuous Galerkin Method for the Helmholtz Equation , 2011, J. Sci. Comput..

[66]  Andrew M. Stuart,et al.  Inverse problems: A Bayesian perspective , 2010, Acta Numerica.

[67]  C. Villani Optimal Transport: Old and New , 2008 .

[68]  Barbara Kaltenbacher,et al.  Iterative Regularization Methods for Nonlinear Ill-Posed Problems , 2008, Radon Series on Computational and Applied Mathematics.

[69]  Yogi A. Erlangga,et al.  Advances in Iterative Methods and Preconditioners for the Helmholtz Equation , 2008 .

[70]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[71]  A. Bakushinsky,et al.  Iterative Methods for Approximate Solution of Inverse Problems , 2005 .

[72]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[73]  W. F. Trench Conditional convergence of infinite products , 1999 .

[74]  J. Hintze,et al.  Violin plots : A box plot-density trace synergism , 1998 .

[75]  C. R. Dietrich,et al.  Fast and Exact Simulation of Stationary Gaussian Processes through Circulant Embedding of the Covariance Matrix , 1997, SIAM J. Sci. Comput..

[76]  L. Gordon,et al.  The Gamma Function , 1994, Series and Products in the Development of Mathematics.

[77]  A. Majda,et al.  Absorbing boundary conditions for the numerical simulation of waves , 1977 .

[78]  G. Arfken Mathematical Methods for Physicists , 1967 .

[79]  F. B. Introduction to Bessel Functions , 1939, Nature.