A Short Note on Gaussian Process Modeling for Large Datasets using Graphics Processing Units

The graphics processing unit (GPU) has emerged as a powerful and cost effective processor for general performance computing. GPUs are capable of an order of magnitude more floating-point operations per second as compared to modern central processing units (CPUs), and thus provide a great deal of promise for computationally intensive statistical applications. Fitting complex statistical models with a large number of parameters and/or for large datasets is often very computationally expensive. In this study, we focus on Gaussian process (GP) models -- statistical models commonly used for emulating expensive computer simulators. We demonstrate that the computational cost of implementing GP models can be significantly reduced by using a CPU+GPU heterogeneous computing system over an analogous implementation on a traditional computing system with no GPU acceleration. Our small study suggests that GP models are fertile ground for further implementation on CPU+GPU systems.

[1]  Bogdan Filipic,et al.  Optimization of Gaussian Process Models with Evolutionary Algorithms , 2011, ICANNGA.

[2]  Aimo A. Törn,et al.  Global Optimization , 1999, Science.

[3]  Hans Werner Meuer,et al.  Top500 Supercomputer Sites , 1997 .

[4]  Michael Granitzer,et al.  Accelerating K-Means on the Graphics Processor via CUDA , 2009, 2009 First International Conference on Intensive Applications and Services.

[5]  Enrique S. Quintana-Ortí,et al.  Exploiting the capabilities of modern GPUs for dense matrix computations , 2009 .

[6]  Thomas J. Santner,et al.  Design and analysis of computer experiments , 1998 .

[7]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[8]  Wu-chun Feng,et al.  Multi-dimensional characterization of temporal data mining on graphics processors , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9]  D. Nychka,et al.  Covariance Tapering for Interpolation of Large Spatial Datasets , 2006 .

[10]  Derek Bingham,et al.  Efficient emulators of computer experiments using compactly supported correlation functions, with an application to cosmology , 2011, 1107.0749.

[11]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[12]  George Michailidis,et al.  Sequential Experiment Design for Contour Estimation From Complex Computer Codes , 2008, Technometrics.

[13]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[14]  Pritam Ranjan,et al.  A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data , 2010, Technometrics.

[15]  Tao Yu,et al.  Reliable multi-objective optimization of high-speed WEDM process based on Gaussian process regression , 2008 .

[16]  Christopher Dyken,et al.  State-of-the-art in heterogeneous computing , 2010, Sci. Program..

[17]  Michael L. Stein,et al.  A modeling approach for large spatial datasets , 2008 .

[18]  Arnaud Doucet,et al.  On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[19]  Neil D. Lawrence,et al.  A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression , 2011, BMC Bioinformatics.

[20]  Giuseppe De Nicolao,et al.  Efficient Marginal Likelihood Computation for Gaussian Process Regression , 2011 .