Efficient Multiscale Gaussian Process Regression using Hierarchical Clustering

Standard Gaussian Process (GP) regression, a powerful machine learning tool, is computationally expensive when it is applied to large datasets, and potentially inaccurate when data points are sparsely distributed in a high-dimensional feature space. To address these challenges, a new multiscale, sparsified GP algorithm is formulated, with the goal of application to large scientific computing datasets. In this approach, the data is partitioned into clusters and the cluster centers are used to define a reduced training set, resulting in an improvement over standard GPs in terms of training and evaluation costs. Further, a hierarchical technique is used to adaptively map the local covariance representation to the underlying sparsity of the feature space, leading to improved prediction accuracy when the data distribution is highly non-uniform. A theoretical investigation of the computational complexity of the algorithm is presented. The efficacy of this method is then demonstrated on smooth and discontinuous analytical functions and on data from a direct numerical simulation of turbulent combustion.

[1]  Zoubin Ghahramani,et al.  Local and global sparse Gaussian process approximations , 2007, AISTATS.

[2]  J. Templeton Evaluation of machine learning algorithms for prediction of regions of high Reynolds averaged Navier Stokes uncertainty , 2015 .

[3]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[4]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[5]  Venkat Raman,et al.  Influence of Fuel Stratification on Turbulent Flame Propagation , 2015 .

[6]  Allen Gersho,et al.  Fast search algorithms for vector quantization and pattern matching , 1984, ICASSP.

[7]  Andy J. Keane,et al.  A Data Parallel Approach for Large-Scale Gaussian Process Modeling , 2002, SDM.

[8]  Nilanjan Chakraborty,et al.  Assessment of sub-grid scalar flux modelling in premixed flames for Large Eddy Simulations: A-priori Direct Numerical Simulation analysis , 2015 .

[9]  Y. Kawahara,et al.  Telemetry-mining: a machine learning approach to anomaly detection and fault diagnosis for space systems , 2006, 2nd IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT'06).

[10]  Anand Pratap Singh,et al.  New Approaches in Turbulence and Transition Modeling Using Data-driven Techniques , 2015 .

[11]  Sunho Park,et al.  Hierarchical Gaussian Process Regression , 2010, ACML.

[12]  Ramani Duraiswami,et al.  Fast Radial Basis Function Interpolation via Preconditioned Krylov Iteration , 2007, SIAM J. Sci. Comput..

[13]  Leslie Greengard,et al.  Fast Direct Methods for Gaussian Processes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[15]  Karthik Duraisamy,et al.  A paradigm for data-driven predictive modeling using field inversion and machine learning , 2016, J. Comput. Phys..

[16]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[17]  Taiyi Zhang,et al.  Multi-scale Gaussian Processes model , 2006 .

[18]  Bernhard Schölkopf,et al.  Sparse multiscale gaussian process regression , 2008, ICML '08.

[19]  Jeffrey C. Lagarias,et al.  Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions , 1998, SIAM J. Optim..

[20]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[21]  Robert B. Gramacy,et al.  Massively parallel approximate Gaussian process regression , 2013, SIAM/ASA J. Uncertain. Quantification.

[22]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[23]  Jean-Luc Aider,et al.  Closed-loop separation control using machine learning , 2014, Journal of Fluid Mechanics.

[24]  Jerzy Chomiak,et al.  Effects of premixed flames on turbulence and turbulent scalar transport , 2010 .

[25]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[26]  Rs Cant,et al.  Scalar transport modeling in large eddy simulation of turbulent premixed flames , 2002 .

[27]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[28]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  E Mjolsness,et al.  Machine learning for science: state of the art and future prospects. , 2001, Science.