Optimal criteria and their asymptotic form for data selection in data-driven reduced-order modelling with Gaussian process regression

We derive criteria for the selection of datapoints used for data-driven reduced-order modeling and other areas of supervised learning based on Gaussian process regression (GPR). While this is a well-studied area in the fields of active learning and optimal experimental design, most criteria in the literature are empirical. Here we introduce an optimality condition for the selection of a new input defined as the minimizer of the distance between the approximated output probability density function (pdf) of the reduced-order model and the exact one. Given that the exact pdf is unknown, we define the selection criterion as the supremum over the unit sphere of the native Hilbert space for the GPR. The resulting selection criterion, however, has a form that is difficult to compute. We combine results from GPR theory and asymptotic analysis to derive a computable form of the defined optimality criterion that is valid in the limit of small predictive variance. The derived asymptotic form of the selection criterion leads to convergence of the GPR model that guarantees a balanced distribution of data resources between probable and large-deviation outputs, resulting in an effective way for sampling towards data-driven reduced-order modeling.

[1]  George Haller,et al.  How to Compute Invariant Manifolds and their Reduced Dynamics in High-Dimensional Finite-Element Models? , 2021, Nonlinear Dynamics.

[2]  Antoine Blanchard,et al.  Output-Weighted Optimal Sampling for Bayesian Experimental Design and Uncertainty Quantification , 2021, SIAM/ASA J. Uncertain. Quantification.

[3]  B. Z. Vulikh A BRIEF COURSE IN THE THEORY OF FUNCTIONS OF A REAL VARIABLE. An Introduction to the Theory of the Integral. , 1976 .

[4]  A. Vakakis,et al.  Advanced Nonlinear System Identification for Modal Interactions in Nonlinear Structures: A Review , 2018, Advanced Structured Materials.

[5]  T. Sapsis Statistics of Extreme Events in Fluid Flows and Waves , 2021 .

[6]  I. Kovacic Nonlinear Oscillations , 2020 .

[7]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[8]  Petros Koumoutsakos,et al.  Machine Learning for Fluid Mechanics , 2019, Annual Review of Fluid Mechanics.

[9]  B. R. Noack,et al.  Actuation response model from sparse data for wall turbulence drag reduction , 2019, Physical Review Fluids.

[10]  Andrew M. Stuart,et al.  Posterior consistency for Gaussian process approximations of Bayesian posterior distributions , 2016, Math. Comput..

[11]  Themistoklis P. Sapsis,et al.  Sequential sampling strategy for extreme event statistics in nonlinear dynamical systems , 2018, Proceedings of the National Academy of Sciences.

[12]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[13]  Robert B. Gramacy,et al.  Adaptive Design and Analysis of Supercomputer Experiments , 2008, Technometrics.

[14]  J. Harkness,et al.  The Theory of Functions of a Real Variable , 1893 .

[15]  O. Ghattas,et al.  Learning physics-based models from data: perspectives from inverse problems and model reduction , 2021, Acta Numerica.

[16]  Antoine Blanchard,et al.  Bayesian Optimization with Output-Weighted Importance Sampling , 2020, ArXiv.

[17]  M. Urner Scattered Data Approximation , 2016 .

[18]  B. R. Noack,et al.  Bayesian optimization for active flow control , 2022, Acta Mechanica Sinica.

[19]  Yibo Yang,et al.  Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs , 2021, ArXiv.

[20]  Zhan Ma,et al.  Data-driven nonintrusive reduced order modeling for dynamical systems with moving boundaries using Gaussian process regression , 2021, ArXiv.

[21]  Ioannis G. Kevrekidis,et al.  On learning Hamiltonian systems from data. , 2019, Chaos.

[22]  A. Wills,et al.  Physics-informed machine learning , 2021, Nature Reviews Physics.

[23]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[24]  K. Worden,et al.  Past, present and future of nonlinear system identification in structural dynamics , 2006 .

[25]  Themistoklis P. Sapsis,et al.  Output-weighted optimal sampling for Bayesian regression and rare event statistics using few samples , 2019, Proceedings of the Royal Society A.