Exact gaussian process regression with distributed computations

Gaussian Processes (GPs) are powerful non-parametric Bayesian models for function estimation, but suffer from high complexity in terms of both computation and storage. To address such issues, approximation methods have flourished in the literature, including model approximations and approximate inference. However, these methods often sacrifice accuracy for scalability. In this work, we present the design and evaluation of a distributed method for exact GP inference, that achieves true model parallelism using simple, high-level distributed computing frameworks. Our experiments show that exact inference at scale is not only feasible, but it also brings substantial benefits in terms of low error rates and accurate quantification of uncertainty.

[1]  M. Gribaudo,et al.  2002 , 2001, Cell and Tissue Research.

[2]  Soumya K. Ghosh,et al.  SPIN: A Fast and Scalable Matrix Inversion Method in Apache Spark , 2018, ICDCN.

[3]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[4]  Lars Karlsson,et al.  Three Algorithms for Cholesky Factorization on Distributed Memory Using Packed Storage , 2006, PARA.

[5]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[6]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[7]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[8]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[9]  Yannis Sismanis,et al.  Sparkler: supporting large-scale matrix factorization , 2013, EDBT '13.

[10]  D. Zheng,et al.  Parallel Cholesky method on MIMD with shared memory , 1995 .

[11]  M. AdelsonVelskii,et al.  AN ALGORITHM FOR THE ORGANIZATION OF INFORMATION , 1963 .

[12]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[13]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[14]  Nirwan Ansari,et al.  Spark-based large-scale matrix inversion for big data processing , 2016, 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[15]  Kamalika Das,et al.  Block-GP: Scalable Gaussian Process Regression for Multimodal Data , 2010, 2010 IEEE International Conference on Data Mining.

[16]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[17]  P. Raghavan Distributed sparse matrix factorization: QR and Cholesky decompositions , 1992 .

[18]  Laura Grigori,et al.  Performance Analysis of Parallel Right-Looking Sparse LU Factorization on Two Dimensional Grids of Processors , 2004, PARA.

[19]  Jack Dongarra,et al.  LAPACK Users' guide (third ed.) , 1999 .

[20]  Ulrike von Luxburg,et al.  Lens Depth Function and k-Relative Neighborhood Graph: Versatile Tools for Ordinal Data Analysis , 2016, J. Mach. Learn. Res..

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22]  Christoforos N. Hadjicostis,et al.  Distributed asynchronous Cholesky decomposition , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[23]  Phalguni Gupta,et al.  Near optimal Cholesky factorization on orthogonal multiprocessors , 2002, Inf. Process. Lett..

[24]  Ashraf Aboulnaga,et al.  Scalable matrix inversion using MapReduce , 2014, HPDC '14.

[25]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[26]  Katherine A. Yelick,et al.  An Asynchronous Task-based Fan-Both Sparse Cholesky Solver , 2016, ArXiv.

[27]  Yelena Yesha,et al.  YinMem: A distributed parallel indexed in-memory computation system for large scale data analytics , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[28]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[29]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[30]  Yuan Qi,et al.  Asynchronous Distributed Variational Gaussian Process for Regression , 2017, ICML.

[31]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[32]  Przemyslaw Stpiczynski,et al.  Parallel Cholesky factorization on orthogonal multiprocessors , 1992, Parallel Comput..

[33]  Inderjit S. Dhillon,et al.  Using Side Information to Reliably Learn Low-Rank Matrices from Missing and Corrupted Observations , 2018, J. Mach. Learn. Res..

[34]  Chiwoo Park,et al.  Patchwork Kriging for Large-scale Gaussian Process Regression , 2017, J. Mach. Learn. Res..

[35]  Neil D. Lawrence,et al.  Gaussian Process Models with Parallelization and GPU acceleration , 2014, ArXiv.

[36]  Maurizio Filippone,et al.  Random Feature Expansions for Deep Gaussian Processes , 2016, ICML.

[37]  Jiannong Cao,et al.  MatrixMap: Programming Abstraction and Implementation of Matrix Computation for Big Data Applications , 2015, 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS).

[38]  S. G. Kratzer Massively parallel sparse LU factorization , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[39]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[40]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.

[41]  Kun Li,et al.  The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..

[42]  Chiwoo Park,et al.  Efficient Computation of Gaussian Process Regression for Large Spatial Data Sets by Patching Local Gaussian Processes , 2016, J. Mach. Learn. Res..

[43]  A. George,et al.  Parallel Cholesky factorization on a shared-memory multiprocessor. Final report, 1 October 1986-30 September 1987 , 1986 .

[44]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[45]  Jack Dongarra,et al.  LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.

[46]  Jaeyoung Choi,et al.  Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..

[47]  Dianne P. O'Leary,et al.  Data-flow algorithms for parallel matrix computation , 1985, CACM.

[48]  Zhengping Qian,et al.  MadLINQ: large-scale distributed matrix computation for the cloud , 2012, EuroSys '12.

[49]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[50]  Andrew Gordon Wilson,et al.  Thoughts on Massively Scalable Gaussian Processes , 2015, ArXiv.

[51]  Arno Solin,et al.  Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[52]  Andy J. Keane,et al.  A Data Parallel Approach for Large-Scale Gaussian Process Modeling , 2002, SDM.

[53]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[54]  Prabhat,et al.  Parallelizing Gaussian Process Calculations in R , 2013, ArXiv.

[55]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..