论文信息 - Scalable Gaussian Processes, with Guarantees: Kernel Approximations and Deep Feature Extraction - 字舞流文

Scalable Gaussian Processes, with Guarantees: Kernel Approximations and Deep Feature Extraction

We provide a linear time inferential framework for Gaussian processes that supports automatic feature extraction through deep neural networks and low-rank kernel approximations. Importantly, we derive approximation guarantees bounding the Kullback-Leibler divergence between the idealized Gaussian process and one resulting from a low-rank approximation to its kernel under two types of approximations, which result in two instantiations of our framework: Deep Fourier Gaussian Processes, resulting from random Fourier feature low-rank approximations, and Deep Mercer Gaussian Processes, resulting from truncating the Mercer expansion of the kernel. We do extensive experimental evaluation of these two instantiations in a broad collection of real-world datasets providing strong evidence that they outperform a broad range of state-of-the-art methods in terms of time efficiency, negative log-predictive density, and root mean squared error.

Constantinos Daskalakis | Aristeidis Panos | Petros Dellaportas

[1] Gregory E. Fasshauer,et al. Green’s Functions: Taking Another Look at Kernel Approximation, RadialBasis Functions, and Splines , 2012 .

[2] Michael J. McCourt,et al. Stable Evaluation of Gaussian Radial Basis Function Interpolants , 2012, SIAM J. Sci. Comput..

[3] James T. Kwok,et al. Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction , 2010, IEEE Transactions on Neural Networks.

[4] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[5] Ameet Talwalkar,et al. Ensemble Nystrom Method , 2009, NIPS.

[6] Zhihua Zhang,et al. Improving the modified nyström method using spectral shifting , 2014, KDD.

[7] Ivor W. Tsang,et al. Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[8] Maurizio Filippone,et al. Calibrating Deep Convolutional Gaussian Processes , 2018, AISTATS.

[9] Inderjit S. Dhillon,et al. Fast Prediction for Large-Scale Kernel Machines , 2014, NIPS.

[10] Haitao Liu,et al. Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression , 2018, ICML.

[11] Haitao Liu,et al. When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12] Harish Karnick,et al. Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[13] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[14] Inderjit S. Dhillon,et al. Computationally Efficient Nyström Approximation using Fast Transforms , 2016, ICML.

[15] Carl E. Rasmussen,et al. Manifold Gaussian Processes for regression , 2014, 2016 International Joint Conference on Neural Networks (IJCNN).

[16] G. Camps-Valls,et al. A Survey on Gaussian Processes for Earth-Observation Data Analysis: A Comprehensive Investigation , 2016, IEEE Geoscience and Remote Sensing Magazine.

[17] Dennis DeCoste,et al. Compact Random Feature Maps , 2013, ICML.

[18] Andrew Gordon Wilson,et al. Deep Kernel Learning , 2015, AISTATS.

[19] Andrew Gordon Wilson,et al. Gaussian Process Regression Networks , 2011, ICML.

[20] Robert B. Gramacy,et al. Ja n 20 08 Bayesian Treed Gaussian Process Models with an Application to Computer Modeling , 2009 .

[21] Carl E. Rasmussen,et al. A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[22] Carl E. Rasmussen,et al. Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[23] François Bachoc,et al. Nested Kriging predictions for datasets with a large number of observations , 2016, Statistics and Computing.

[24] Deli Zhao,et al. Scalable Gaussian Process Regression Using Deep Neural Networks , 2015, IJCAI.

[25] Andrew Gordon Wilson,et al. Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[26] AI Koan,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[27] Michael W. Mahoney,et al. Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[28] Reza Ebrahimpour,et al. Mixture of experts: a literature survey , 2014, Artificial Intelligence Review.

[29] Tomoharu Iwata,et al. Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes , 2017, 1707.05922.

[30] K Fan,et al. Minimax Theorems. , 1953, Proceedings of the National Academy of Sciences of the United States of America.

[31] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[32] Alexander J. Smola,et al. Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[33] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34] Andrew Gordon Wilson,et al. Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[35] Iain Murray,et al. A framework for evaluating approximation methods for Gaussian process regression , 2012, J. Mach. Learn. Res..

[36] Marc Peter Deisenroth,et al. Distributed Gaussian Processes , 2015, ICML.

[37] Evgeny Burnaev,et al. Forecasting of Commercial Sales with Large Scale Gaussian Processes , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[38] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[39] Carl E. Rasmussen,et al. Rates of Convergence for Sparse Variational Gaussian Process Regression , 2019, ICML.

[40] Richard E. Turner,et al. Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs , 2015, ICML.

[41] Ameya Velingker,et al. Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees , 2018, ICML.

[42] Zoubin Ghahramani,et al. Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks , 2017, 1707.02476.

[43] Dirk Roos,et al. Deep Gaussian Covariance Network , 2017, ArXiv.

[44] James Hensman,et al. Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[45] Neil D. Lawrence,et al. Gaussian Processes for Big Data , 2013, UAI.

[46] Neil D. Lawrence,et al. Deep Gaussian Processes , 2012, AISTATS.

[47] J. Mercer. Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[48] Mikio L. Braun,et al. Accurate Error Bounds for the Eigenvalues of the Kernel Matrix , 2006, J. Mach. Learn. Res..

[49] Maurizio Filippone,et al. Random Feature Expansions for Deep Gaussian Processes , 2016, ICML.

[50] Manfred Opper,et al. Finite-Dimensional Approximation of Gaussian Processes , 1998, NIPS.

[51] Alexis Boukouvalas,et al. GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[52] Robert B. Gramacy,et al. laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R , 2016 .

[53] Arno Solin,et al. Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[54] Vikas Sindhwani,et al. Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels , 2014, J. Mach. Learn. Res..

[55] D. Sculley,et al. Web-scale k-means clustering , 2010, WWW '10.

[56] James T. Kwok,et al. Making Large-Scale Nyström Approximation Possible , 2010, ICML.

[57] Geoffrey E. Hinton,et al. Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[58] Andrew Gordon Wilson,et al. Learning Scalable Deep Kernels with Recurrent Structure , 2016, J. Mach. Learn. Res..

[59] Prateek Jain,et al. Online and Stochastic Gradient Methods for Non-decomposable Loss Functions , 2014, NIPS.

[60] Stephen Tyree,et al. Exact Gaussian Processes on a Million Data Points , 2019, NeurIPS.

[61] Michalis K. Titsias,et al. Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.