Scalable Gaussian Process Using Inexact Admm for Big Data

Gaussian process (GP) for machine learning has been well studied over the past two decades and is now widely used in many sectors. However, the design of low-complexity GP models still remains a challenging research problem. In this paper, we propose a novel scalable GP regression model for processing big datasets, using a large number of parallel computation units. In contrast to the existing methods, we solve the classic maximum likelihood based hyper-parameter optimization problem by a carefully designed distributed alternating direction method of multipliers (ADMM). The proposed method is parallelizable over a large number of computation units. Simulation results confirm the benefits of the proposed scalable GP model over the state-of-the-art distributed methods.

[1]  Yue Xu,et al.  High-Accuracy Wireless Traffic Prediction: A GP-Based Machine Learning Approach , 2017, GLOBECOM 2017 - 2017 IEEE Global Communications Conference.

[2]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[3]  Leslie Greengard,et al.  Fast Direct Methods for Gaussian Processes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[5]  Sergios Theodoridis,et al.  Sparse Structure Enabled Grid Spectral Mixture Kernel for Temporal Gaussian Process Regression , 2018, 2018 21st International Conference on Information Fusion (FUSION).

[6]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[7]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2015, ICASSP.

[8]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[9]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[10]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[11]  Arno Solin,et al.  Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering , 2013, IEEE Signal Processing Magazine.

[12]  Tianshi Chen,et al.  On kernel design for regularized LTI system identification , 2016, Autom..

[13]  Damien Garcia,et al.  Robust smoothing of gridded data in one and higher dimensions with missing values , 2010, Comput. Stat. Data Anal..

[14]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.