When Gaussian Process Meets Big Data: A Review of Scalable GPs

The vast quantity of information brought by big data as well as the evolving computer hardware encourages success stories in the machine learning community. In the meanwhile, it poses challenges for the Gaussian process regression (GPR), a well-known nonparametric, and interpretable Bayesian model, which suffers from cubic complexity to data size. To improve the scalability while retaining desirable prediction quality, a variety of scalable GPs have been presented. However, they have not yet been comprehensively reviewed and analyzed to be well understood by both academia and industry. The review of scalable GPs in the GP community is timely and important due to the explosion of data size. To this end, this article is devoted to reviewing state-of-the-art scalable GPs involving two main categories: global approximations that distillate the entire data and local approximations that divide the data for subspace learning. Particularly, for global approximations, we mainly focus on sparse approximations comprising prior approximations that modify the prior but perform exact inference, posterior approximations that retain exact prior but perform approximate inference, and structured sparse approximations that exploit specific structures in kernel matrix; for local approximations, we highlight the mixture/product of experts that conducts model averaging from multiple local experts to boost predictions. To present a complete review, recent advances for improving the scalability and capability of scalable GPs are reviewed. Finally, the extensions and open issues of scalable GPs in various scenarios are reviewed and discussed to inspire novel ideas for future research avenues.

[1]  Paul W. Goldberg,et al.  Regression with Input-dependent Noise: A Gaussian Process Treatment , 1997, NIPS.

[2]  Alexander J. Smola,et al.  Fast Kronecker Inference in Gaussian Processes with non-Gaussian Likelihoods , 2015, ICML.

[3]  Neil D. Lawrence,et al.  Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes , 2016, J. Mach. Learn. Res..

[4]  Trung Le,et al.  GoGP: Fast Online Regression with Gaussian Processes , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[5]  Chiwoo Park,et al.  Patchwork Kriging for Large-scale Gaussian Process Regression , 2017, J. Mach. Learn. Res..

[6]  Robert B. Gramacy,et al.  Ja n 20 08 Bayesian Treed Gaussian Process Models with an Application to Computer Modeling , 2009 .

[7]  Neil D. Lawrence,et al.  Gaussian Process Models with Parallelization and GPU acceleration , 2014, ArXiv.

[8]  Andrew Gordon Wilson,et al.  Fast Kernel Learning for Multidimensional Pattern Extrapolation , 2014, NIPS.

[9]  James Hensman,et al.  Gaussian Process Conditional Density Estimation , 2018, NeurIPS.

[10]  Neil D. Lawrence,et al.  Overlapping Mixtures of Gaussian Processes for the Data Association Problem , 2011, Pattern Recognit..

[11]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[12]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[13]  Gaurav S. Sukhatme,et al.  Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena , 2012, UAI.

[14]  Joachim Denzler,et al.  Large-scale gaussian process multi-class classification for semantic segmentation and facade recognition , 2013, Machine Vision and Applications.

[15]  Richard E. Turner,et al.  Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs , 2015, ICML.

[16]  Jan Peters,et al.  Model Learning with Local Gaussian Process Regression , 2009, Adv. Robotics.

[17]  Faicel Chamroukhi Skew t mixture of experts , 2017, Neurocomputing.

[18]  James Hensman,et al.  MCMC for Variationally Sparse Gaussian Processes , 2015, NIPS.

[19]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[20]  Miguel Lázaro-Gredilla,et al.  Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression , 2013, NIPS.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Yuan Qi,et al.  Asynchronous Distributed Variational Gaussian Process for Regression , 2017, ICML.

[23]  Lexin Li,et al.  Sufficient dimension reduction via bayesian mixture modeling. , 2011, Biometrics.

[24]  Yiannis Demiris,et al.  Nonparametric Mixtures of Gaussian Processes With Power-Law Behavior , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Andrew Gordon Wilson,et al.  Product Kernel Interpolation for Scalable Gaussian Processes , 2018, AISTATS.

[26]  Kian Hsiang Low,et al.  Stochastic Variational Inference for Bayesian Sparse Gaussian Process Regression , 2017, 2019 International Joint Conference on Neural Networks (IJCNN).

[27]  Andrew Y. Ng,et al.  Fast Gaussian Process Regression using KD-Trees , 2005, NIPS.

[28]  Yu Ding,et al.  Domain Decomposition Approach for Fast Gaussian Process Regression of Large Spatial Data Sets , 2011, J. Mach. Learn. Res..

[29]  Sunho Park,et al.  Hierarchical Gaussian Process Regression , 2010, ACML.

[30]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[31]  Geoffrey J. McLachlan,et al.  Extension of mixture-of-experts networks for binary classification of hierarchical data , 2007, Artif. Intell. Medicine.

[32]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[33]  Stefan Schaal,et al.  Incremental Local Gaussian Regression , 2014, NIPS.

[34]  Kian Hsiang Low,et al.  A Distributed Variational Inference Framework for Unifying Parallel Sparse Gaussian Process Regression Models , 2016, ICML.

[35]  Miguel Lázaro-Gredilla,et al.  Variational Heteroscedastic Gaussian Process Regression , 2011, ICML.

[36]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[37]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  Shuangzhe Liu,et al.  Hadamard, Khatri-Rao, Kronecker and Other Matrix Products , 2008 .

[39]  Richard E. Turner,et al.  Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.

[40]  Richard E. Turner,et al.  Streaming Sparse Gaussian Process Approximations , 2017, NIPS.

[41]  Wei Chu,et al.  A matching pursuit approach to sparse Gaussian process regression , 2005, NIPS.

[42]  Lorenzo Rosasco,et al.  Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification , 2018, NeurIPS.

[43]  Dong Xiang,et al.  The Bias-Variance Tradeoff and the Randomized GACV , 1998, NIPS.

[44]  Zoubin Ghahramani,et al.  Local and global sparse Gaussian process approximations , 2007, AISTATS.

[45]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[46]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[47]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[48]  Samuel Kaski,et al.  Deep learning with differential Gaussian process flows , 2018, AISTATS.

[49]  Thomas B. Schön,et al.  Computationally Efficient Bayesian Learning of Gaussian Process State Space Models , 2015, AISTATS.

[50]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[51]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[52]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[53]  Yuichi Yoshida,et al.  On Random Subsampling of Gaussian Process Regression: A Graphon-Based Analysis , 2020, AISTATS.

[54]  Prasanth B. Nair,et al.  Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF) , 2018, ICML.

[55]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[56]  Yuan Qi,et al.  EigenGP: Gaussian Process Models with Adaptive Eigenfunctions , 2014, IJCAI.

[57]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[58]  Byron Boots,et al.  Variational Inference for Gaussian Process Models with Linear Complexity , 2017, NIPS.

[59]  Okyay Kaynak,et al.  Big Data for Modern Industry: Challenges and Trends [Point of View] , 2015, Proc. IEEE.

[60]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[61]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[62]  Richard E. Turner,et al.  Tree-structured Gaussian Process Approximations , 2014, NIPS.

[63]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[64]  Carl E. Rasmussen,et al.  Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC , 2013, NIPS.

[65]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[66]  Maurizio Filippone,et al.  Random Feature Expansions for Deep Gaussian Processes , 2016, ICML.

[67]  Radford M. Neal,et al.  Gaussian Process Regression with Heteroscedastic or Non-Gaussian Residuals , 2012, ArXiv.

[68]  Kian Ming Adam Chai,et al.  Variational Multinomial Logit Gaussian Process , 2012, J. Mach. Learn. Res..

[69]  Hugh F. Durrant-Whyte,et al.  Gaussian Process modeling of large scale terrain , 2009, 2009 IEEE International Conference on Robotics and Automation.

[70]  Robert B. Gramacy,et al.  Massively parallel approximate Gaussian process regression , 2013, SIAM/ASA J. Uncertain. Quantification.

[71]  G. Camps-Valls,et al.  A Survey on Gaussian Processes for Earth-Observation Data Analysis: A Comprehensive Investigation , 2016, IEEE Geoscience and Remote Sensing Magazine.

[72]  Simon Osindero,et al.  An Alternative Infinite Mixture Of Gaussian Process Experts , 2005, NIPS.

[73]  Stephen J. Roberts,et al.  A Sparse Gaussian Process Framework for Photometric Redshift Estimation , 2015, ArXiv.

[74]  J. Geweke,et al.  Smoothly mixing regressions , 2007 .

[75]  Carl E. Rasmussen,et al.  Convolutional Gaussian Processes , 2017, NIPS.

[76]  Andrew Gordon Wilson,et al.  Constant-Time Predictive Distributions for Gaussian Processes , 2018, ICML.

[77]  Kian Hsiang Low,et al.  A Unifying Framework of Anytime Sparse Gaussian Process Regression Models with Stochastic Variational Inference for Big Data , 2015, ICML.

[78]  Aníbal R. Figueiras-Vidal,et al.  Inter-domain Gaussian Processes for Sparse Inference using Inducing Features , 2009, NIPS.

[79]  Zoubin Ghahramani,et al.  Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks , 2017, 1707.02476.

[80]  Daniel Hernández-Lobato,et al.  Scalable Gaussian Process Classification via Expectation Propagation , 2015, AISTATS.

[81]  James R. Gattiker,et al.  Parallel Bayesian Additive Regression Trees , 2013, 1309.1906.

[82]  Edwin V. Bonilla,et al.  Fast Allocation of Gaussian Process Experts , 2014, ICML.

[83]  Haitao Liu,et al.  Remarks on multi-output Gaussian process regression , 2018, Knowl. Based Syst..

[84]  Andrew Gordon Wilson,et al.  Thoughts on Massively Scalable Gaussian Processes , 2015, ArXiv.

[85]  Fernando José Von Zuben,et al.  Hybridizing mixtures of experts with support vector machines: Investigation into nonlinear dynamic systems identification , 2007, Inf. Sci..

[86]  Arno Solin,et al.  Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[87]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[88]  Jinwen Ma,et al.  An Efficient EM Approach to Parameter Learning of the Mixture of Gaussian Processes , 2011, ISNN.

[89]  Shiliang Sun,et al.  Variational Dependent Multi-output Gaussian Process Dynamical Systems , 2014, J. Mach. Learn. Res..

[90]  David M. Blei,et al.  Augment and Reduce: Stochastic Inference for Large Categorical Distributions , 2018, ICML.

[91]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[92]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[93]  Shin Ishii,et al.  Fast Approximation Method for Gaussian Process Regression Using Hash Function for Non-uniformly Distributed Data , 2013, ICANN.

[94]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[95]  Marco F. Huber Recursive Gaussian process: On-line regression and learning , 2014, Pattern Recognit. Lett..

[96]  Richard E. Turner,et al.  A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..

[97]  Ulf Brefeld,et al.  Distributed robust Gaussian Process regression , 2018, Knowledge and Information Systems.

[98]  Marius Kloft,et al.  Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation , 2018, AAAI.

[99]  Martin D. Buhmann,et al.  A new class of radial basis functions with compact support , 2001, Math. Comput..

[100]  Jinwen Ma,et al.  A Precise Hard-Cut EM Algorithm for Mixtures of Gaussian Processes , 2014, ICIC.

[101]  Ponnuthurai N. Suganthan,et al.  Ensemble Classification and Regression-Recent Developments, Applications and Future Directions [Review Article] , 2016, IEEE Computational Intelligence Magazine.

[102]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[103]  D. Hand,et al.  Bayesian partition modelling , 2002 .

[104]  John P. Cunningham,et al.  Fast Gaussian process methods for point process intensity estimation , 2008, ICML '08.

[105]  Stephen J. Roberts,et al.  GPz: non-stationary sparse Gaussian processes for heteroscedastic uncertainty estimation in photometric redshifts , 2016, 1604.03593.

[106]  Carl E. Rasmussen,et al.  Understanding Probabilistic Sparse Gaussian Process Approximations , 2016, NIPS.

[107]  Dirk Roos,et al.  Deep Gaussian Covariance Network , 2017, ArXiv.

[108]  Karthik Duraisamy,et al.  Efficient Multiscale Gaussian Process Regression using Hierarchical Clustering , 2015, ArXiv.

[109]  François Bachoc,et al.  Nested Kriging predictions for datasets with a large number of observations , 2016, Statistics and Computing.

[110]  Kian Hsiang Low,et al.  A Generalized Stochastic Variational Bayesian Hyperparameter Learning Framework for Sparse Spectrum Gaussian Process Regression , 2016, AAAI.

[111]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[112]  D. Nychka,et al.  A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets , 2015 .

[113]  Neil D. Lawrence,et al.  Fast Variational Inference in the Conjugate Exponential Family , 2012, NIPS.

[114]  David J. Fleet,et al.  Generalized Product of Experts for Automatic and Principled Fusion of Gaussian Process Predictions , 2014, ArXiv.

[115]  Leslie Greengard,et al.  Fast Direct Methods for Gaussian Processes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[116]  Edwin V. Bonilla,et al.  Collaborative Multi-output Gaussian Processes , 2014, UAI.

[117]  Sudipto Banerjee,et al.  On nearest‐neighbor Gaussian process models for massive spatial data , 2016, Wiley interdisciplinary reviews. Computational statistics.

[118]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[119]  Marc Toussaint,et al.  Efficient sparsification for Gaussian process regression , 2016, Neurocomputing.

[120]  Michael A. Osborne,et al.  Preconditioning Kernel Matrices , 2016, ICML.

[121]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[122]  Zoubin Ghahramani,et al.  Variable Noise and Dimensionality Reduction for Sparse Gaussian processes , 2006, UAI.

[123]  P. K. Srijith,et al.  Deep Gaussian Processes with Convolutional Kernels , 2018, ArXiv.

[124]  Manfred Opper,et al.  Scalable Multi-Class Gaussian Process Classification via Data Augmentation , 2018 .

[125]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[126]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[127]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[128]  Duy Nguyen-Tuong,et al.  Local Gaussian Process Regression for Real Time Online Model Learning , 2008, NIPS.

[129]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[130]  Carl E. Rasmussen,et al.  Healing the relevance vector machine through augmentation , 2005, ICML.

[131]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[132]  Iain Murray,et al.  A framework for evaluating approximation methods for Gaussian process regression , 2012, J. Mach. Learn. Res..

[133]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[134]  Harry van Zanten,et al.  An asymptotic analysis of distributed nonparametric methods , 2017, J. Mach. Learn. Res..

[135]  Larry S. Davis,et al.  Automatic online tuning for fast Gaussian summation , 2008, NIPS.

[136]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.

[137]  Shandian Zhe Regularized Variational Sparse Gaussian Processes , 2017 .

[138]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[139]  Tomoharu Iwata,et al.  Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes , 2017, 1707.05922.

[140]  Dmitry Kropotov,et al.  Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition , 2017, AISTATS.

[141]  Sotiris B. Kotsiantis,et al.  Combining bagging, boosting, rotation forest and random subspace methods , 2011, Artificial Intelligence Review.

[142]  Yu Ding,et al.  Bayesian site selection for fast Gaussian process regression , 2014 .

[143]  D. Dunson,et al.  Bayesian Manifold Regression , 2013, 1305.0617.

[144]  David J. Fleet,et al.  Efficient Optimization for Sparse Gaussian Process Regression , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[145]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[146]  Marc Peter Deisenroth,et al.  Hierarchical Mixture-of-Experts Model for Large-Scale Gaussian Process Regression , 2014, ArXiv.

[147]  Stephen J. Roberts,et al.  String and Membrane Gaussian Processes , 2015, J. Mach. Learn. Res..

[148]  James Hensman,et al.  Deep Gaussian Processes with Importance-Weighted Variational Inference , 2019, ICML.

[149]  Jinwen Ma,et al.  An Effective Model Selection Criterion for Mixtures of Gaussian Processes , 2015, ISNN.

[150]  Haitao Liu,et al.  Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression , 2018, ICML.

[151]  Haitao Liu,et al.  A survey of adaptive sampling for global metamodeling in support of simulation-based complex engineering design , 2017, Structural and Multidisciplinary Optimization.

[152]  Jie Yu,et al.  A Bayesian model averaging based multi-kernel Gaussian process regression framework for nonlinear state estimation and quality prediction of multiphase batch processes with transient dynamics and uncertainty , 2013 .

[153]  Jus Kocijan,et al.  Dynamical systems identification using Gaussian process models with incorporated local models , 2011, Eng. Appl. Artif. Intell..

[154]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[155]  Hubert P. H. Shum,et al.  Kinect Posture Reconstruction Based on a Local Mixture of Gaussian Process Models , 2016, IEEE Transactions on Visualization and Computer Graphics.

[156]  Carl E. Rasmussen,et al.  Variational Gaussian Process State-Space Models , 2014, NIPS.

[157]  Andrew Gordon Wilson,et al.  Learning Scalable Deep Kernels with Recurrent Structure , 2016, J. Mach. Learn. Res..

[158]  Shiliang Sun,et al.  Variational Inference for Infinite Mixtures of Gaussian Processes With Applications to Traffic Flow Prediction , 2011, IEEE Transactions on Intelligent Transportation Systems.

[159]  Haitao Liu,et al.  Scalable Gaussian Process Classification with Additive Noise for Various Likelihoods , 2019, ArXiv.

[160]  David B. Dunson,et al.  Multiresolution Gaussian Processes , 2012, NIPS.

[161]  Carl E. Rasmussen,et al.  Integrated pre-processing for Bayesian nonlinear system identification with Gaussian processes , 2013, 52nd IEEE Conference on Decision and Control.

[162]  Seyed Abolfazl Motahari,et al.  Learning of Gaussian Processes in Distributed and Communication Limited Systems , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[163]  Geoffrey E. Hinton,et al.  An Alternative Model for Mixtures of Experts , 1994, NIPS.

[164]  Guilherme De A. Barreto,et al.  A stochastic variational framework for Recurrent Gaussian Processes models , 2019, Neural Networks.

[165]  Ivor W. Tsang,et al.  Survey on Multi-Output Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[166]  Keith Worden,et al.  On Gaussian Process NARX Models and Their Higher-Order Frequency Response Functions , 2014 .

[167]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[168]  Kian Hsiang Low,et al.  Parallel Gaussian Process Regression for Big Data: Low-Rank Representation Meets Markov Approximation , 2014, AAAI.

[169]  Byron Boots,et al.  Incremental Variational Sparse Gaussian Process Regression , 2016, NIPS.

[170]  Yi Ding,et al.  Multiresolution Kernel Approximation for Gaussian Process Regression , 2017, NIPS.

[171]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[172]  D. Mackay,et al.  Introduction to Gaussian processes , 1998 .

[173]  Aki Vehtari,et al.  Modelling local and global phenomena with sparse Gaussian processes , 2008, UAI.

[174]  Joseph Morlier,et al.  Approximate Inference in Related Multi-output Gaussian Process Regression , 2016, ICPRAM.

[175]  Guilherme De A. Barreto,et al.  An Empirical Evaluation of Robust Gaussian Process Models for System Identification , 2015, IDEAL.

[176]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[177]  James Hensman,et al.  Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models , 2018, AISTATS.

[178]  Andrew Gordon Wilson,et al.  Scalable Log Determinants for Gaussian Process Kernel Learning , 2017, NIPS.

[179]  Francisco Herrera,et al.  On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[180]  Daniel Hernández-Lobato,et al.  Scalable Multi-Class Gaussian Process Classification using Expectation Propagation , 2017, ICML.

[181]  G. Matheron Principles of geostatistics , 1963 .

[182]  Steve R. Waterhouse,et al.  Bayesian Methods for Mixtures of Experts , 1995, NIPS.

[183]  Michael A. Osborne,et al.  Blitzkriging: Kronecker-structured Stochastic Gaussian Processes , 2015, 1510.07965.

[184]  Agathe Girard,et al.  Dynamic systems identification with Gaussian processes , 2005 .

[185]  X. Emery The kriging update equations and their application to the selection of neighboring data , 2009 .

[186]  Abdesselam Bouzerdoum,et al.  Variational inference for infinite mixtures of sparse Gaussian processes through KL-correction , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[187]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[188]  David B. Dunson,et al.  Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds , 2010, IEEE Transactions on Signal Processing.

[189]  Neil D. Lawrence,et al.  Recurrent Gaussian Processes , 2015, ICLR.

[190]  Daniel W. Apley,et al.  Local Gaussian Process Approximation for Large Computer Experiments , 2013, 1303.0383.

[191]  Jianfei Cai,et al.  Understanding and Comparing Scalable Gaussian Process Regression for Big Data , 2018, Knowl. Based Syst..

[192]  Evgeny Burnaev,et al.  Forecasting of Commercial Sales with Large Scale Gaussian Processes , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[193]  Kee-Eung Kim,et al.  Hierarchically-partitioned Gaussian Process Approximation , 2017, AISTATS.

[194]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[195]  Carl E. Rasmussen,et al.  Rates of Convergence for Sparse Variational Gaussian Process Regression , 2019, ICML.

[196]  Jianfei Cai,et al.  Large-Scale Heteroscedastic Regression via Gaussian Process , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[197]  Hansheng Wang,et al.  Estimating Mixture of Gaussian Processes by Kernel Smoothing , 2013, Journal of business & economic statistics : a publication of the American Statistical Association.

[198]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[199]  Neil D. Lawrence,et al.  Variational Auto-encoded Deep Gaussian Processes , 2015, ICLR.

[200]  Sean B. Holden,et al.  The Generalized FITC Approximation , 2007, NIPS.

[201]  Michel Verhaegen,et al.  System identification through online sparse Gaussian process regression with input noise , 2016, IFAC J. Syst. Control..

[202]  Fabio Tozeto Ramos,et al.  A Sparse Covariance Function for Exact Gaussian Process Inference in Large Datasets , 2009, IJCAI.

[203]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[204]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[205]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[206]  Chiwoo Park,et al.  Efficient Computation of Gaussian Process Regression for Large Spatial Data Sets by Patching Local Gaussian Processes , 2016, J. Mach. Learn. Res..

[207]  Reza Ebrahimpour,et al.  Mixture of experts: a literature survey , 2014, Artificial Intelligence Review.

[208]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[209]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[210]  Jennifer G. Dy,et al.  Nonparametric Mixture of Gaussian Processes with Constraints , 2013, ICML.

[211]  Neil D. Lawrence,et al.  Deep recurrent Gaussian processes for outlier-robust system identification , 2017 .

[212]  Hyun-Chul Kim,et al.  Bayesian Gaussian Process Classification with the EM-EP Algorithm , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[213]  Manfred Opper,et al.  Sparse Representation for Gaussian Process Models , 2000, NIPS.

[214]  Neil D. Lawrence,et al.  Computationally Efficient Convolved Multiple Output Gaussian Processes , 2011, J. Mach. Learn. Res..

[215]  Edward Lloyd Snelson,et al.  Flexible and efficient Gaussian process models for machine learning , 2007 .

[216]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[217]  Bernhard Schölkopf,et al.  Sparse multiscale gaussian process regression , 2008, ICML '08.

[218]  Gene H. Golub,et al.  Matrix computations , 1983 .

[219]  James Hensman,et al.  On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes , 2015, AISTATS.

[220]  David B. Dunson,et al.  Compressed Gaussian Process for Manifold Regression , 2016, J. Mach. Learn. Res..

[221]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[222]  Thomas B. Schön,et al.  Online sparse Gaussian process regression using FITC and PITC approximations , 2015 .

[223]  Roni Khardon,et al.  Sparse Variational Inference for Generalized GP Models , 2015, ICML.

[224]  Christopher M. Bishop,et al.  Bayesian Hierarchical Mixtures of Experts , 2002, UAI.

[225]  M. Urner Scattered Data Approximation , 2016 .

[226]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[227]  Robert B. Gramacy,et al.  Adaptive Design and Analysis of Supercomputer Experiments , 2008, Technometrics.

[228]  Richard E. Turner,et al.  Learning Stationary Time Series using Gaussian Processes with Nonparametric Kernels , 2015, NIPS.

[229]  Dongbing Gu,et al.  Spatial Gaussian Process Regression With Mobile Sensor Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[230]  Alexander G. de G. Matthews,et al.  Scalable Gaussian process inference using variational methods , 2017 .

[231]  Tao Chen,et al.  Bagging for Gaussian process regression , 2009, Neurocomputing.

[232]  Richard E. Turner,et al.  Variational Bayesian Inference : Algorithms for Sparse Gaussian Processes and Theoretical Bounds , 2017 .

[233]  Juan José Murillo-Fuentes,et al.  Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo , 2018, NeurIPS.

[234]  Robert B. Gramacy,et al.  laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R , 2016 .

[235]  Pritam Ranjan,et al.  A Short Note on Gaussian Process Modeling for Large Datasets using Graphics Processing Units , 2011 .

[236]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[237]  Nak Young Chong,et al.  Resource-constrained decentralized active sensing for multi-robot systems using distributed Gaussian processes , 2016, 2016 16th International Conference on Control, Automation and Systems (ICCAS).

[238]  Kian Hsiang Low,et al.  Stochastic Variational Inference for Fully Bayesian Sparse Gaussian Process Regression Models , 2017, ArXiv.

[239]  T. Gneiting Compactly Supported Correlation Functions , 2002 .

[240]  Neil D. Lawrence,et al.  Variational Inference for Uncertainty on the Inputs of Gaussian Process Models , 2014, ArXiv.

[241]  B. Mallick,et al.  Analyzing Nonstationary Spatial Data Using Piecewise Gaussian Processes , 2005 .

[242]  Robert B. Gramacy,et al.  Speeding Up Neighborhood Search in Local Gaussian Process Prediction , 2014, Technometrics.

[243]  M. Malshe,et al.  Theoretical investigation of the dissociation dynamics of vibrationally excited vinyl bromide on an ab initio potential-energy surface obtained using modified novelty sampling and feedforward neural networks. II. Numerical application of the method. , 2007, The Journal of chemical physics.

[244]  Stephen Tyree,et al.  Exact Gaussian Processes on a Million Data Points , 2019, NeurIPS.

[245]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[246]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[247]  Michalis K. Titsias,et al.  Variational Model Selection for Sparse Gaussian Process Regression , 2008 .

[248]  Chao Yuan,et al.  Variational Mixture of Gaussian Process Experts , 2008, NIPS.

[249]  Volker Tresp,et al.  Mixtures of Gaussian Processes , 2000, NIPS.

[250]  David J. Nott,et al.  Variational inference for sparse spectrum Gaussian process regression , 2013, Stat. Comput..

[251]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[252]  Elad Gilboa,et al.  Scaling Multidimensional Inference for Structured Gaussian Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[253]  Carl E. Rasmussen,et al.  Manifold Gaussian Processes for regression , 2014, 2016 International Joint Conference on Neural Networks (IJCNN).

[254]  Joseph N. Wilson,et al.  Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[255]  Thomas J. Santner,et al.  Design and analysis of computer experiments , 1998 .

[256]  Kian Hsiang Low,et al.  Active Markov information-theoretic path planning for robotic environmental sensing , 2011, AAMAS.

[257]  Kian Hsiang Low,et al.  GP-Localize: Persistent Mobile Robot Localization using Online Sparse Gaussian Process Observation Model , 2014, AAAI.

[258]  Prabhat,et al.  Parallelizing Gaussian Process Calculations in R , 2013, ArXiv.

[259]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[260]  Carl Edward Rasmussen,et al.  Observations on the Nyström Method for Gaussian Process Prediction , 2002 .

[261]  Jus Kocijan,et al.  Evolving Gaussian process models for predicting chaotic time-series , 2014, 2014 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS).

[262]  J. Vanhatalo,et al.  Approximate inference for disease mapping with sparse Gaussian processes , 2010, Statistics in medicine.

[263]  Marc Peter Deisenroth,et al.  Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.

[264]  Kian Hsiang Low,et al.  Parallel Gaussian Process Regression with Low-Rank Covariance Matrix Approximations , 2013, UAI.

[265]  Ke Chen,et al.  Improved learning algorithms for mixture of experts in multiclass classification , 1999, Neural Networks.