Kernels for Vector-Valued Functions: a Review

Kernel methods are among the most popular techniques in machine learning. From a regularization perspective they play a central role in regularization theory as they provide a natural choice for the hypotheses space and the regularization functional through the notion of reproducing kernel Hilbert spaces. From a probabilistic perspective they are the key in the context of Gaussian processes, where the kernel function is known as the covariance function. Traditionally, kernel methods have been used in supervised learning problems with scalar outputs and indeed there has been a considerable amount of work devoted to designing and learning kernels. More recently there has been an increasing interest in methods that deal with multiple outputs, motivated partially by frameworks like multitask learning. In this monograph, we review different methods to design or learn valid kernel functions for multiple outputs, paying particular attention to the connection between probabilistic and functional methods.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  P. H. Müller,et al.  L. Hörmander, Linear Partial Differential Operators. VIII + 284 S. m. 1 Fig. Berlin/Göttingen/Heidelberg 1963. Springer-Verlag. Preis geb. DM 42,- . , 1964 .

[3]  Laurent Schwartz,et al.  Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associés (Noyaux reproduisants) , 1964 .

[4]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[5]  G. Matheron The intrinsic random functions and their applications , 1973, Advances in Applied Probability.

[6]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[7]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[8]  L. Hörmander The analysis of linear partial differential operators , 1990 .

[9]  G. Wahba Spline models for observational data , 1990 .

[10]  James Parker,et al.  on Knowledge and Data Engineering, , 1990 .

[11]  M. Goulard,et al.  Linear coregionalization model: Tools for estimation and choice of cross-variogram matrix , 1992 .

[12]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[13]  H. Künsch,et al.  On the pseudo cross-variogram , 1993 .

[14]  N. Cressie,et al.  Universal cokriging under intrinsic coregionalization , 1994 .

[15]  F. J. Narcowich,et al.  Generalized Hermite interpolation via matrix-valued conditionally positive definite functions , 1994 .

[16]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[17]  H. Künsch,et al.  Generalized cross-covariances and their estimation , 1996 .

[18]  Ronald P. Barry,et al.  Blackbox Kriging: Spatial Prediction Without Specifying Variogram Models , 1996 .

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[21]  F. Woodward,et al.  Vegetation-climate feedbacks in a greenhouse world , 1998 .

[22]  Ronald P. Barry,et al.  Constructing and fitting models for cokriging and multivariable spatial prediction , 1998 .

[23]  David Higdon,et al.  Non-Stationary Spatial Modeling , 2022, 2212.08043.

[24]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Timothy C. Coburn,et al.  Geostatistics for Natural Resources Evaluation , 2000, Technometrics.

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[28]  T. Gneiting,et al.  Analogies and correspondences between variograms and covariance functions , 2001, Advances in Applied Probability.

[29]  Michael J. Townsend,et al.  Thomas Piketty: Capital in the twenty-first century , 2014, Public Choice.

[30]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[31]  J. Vargas-Guzmán,et al.  Coregionalization by Linear Combination of Nonorthogonal Components , 2002 .

[32]  M. Fuentes Interpolation of nonstationary air pollution processes: a spatial spectral approach , 2002 .

[33]  C. Wikle A kernel-based spectral model for non-Gaussian spatio-temporal processes , 2002 .

[34]  M. Fuentes Spectral methods for nonstationary spatial processes , 2002 .

[35]  D. Higdon Space and Space-Time Modeling using Process Convolutions , 2002 .

[36]  R. Larka,et al.  Fitting a linear model of coregionalization for soil properties using simulated annealing , 2002 .

[37]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[38]  Carl E. Rasmussen,et al.  Analysis of Some Methods for Reduced Rank Gaussian Process Regression , 2003, European Summer School on Multi-AgentControl.

[39]  Christopher K. Wikle,et al.  Hierarchical Bayesian Models for Predicting The Spread of Ecological Processes , 2003 .

[40]  Mark J. Schervish,et al.  Nonstationary Covariance Functions for Gaussian Process Regression , 2003, NIPS.

[41]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[42]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[43]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[44]  Svenja Lowitzsch,et al.  A density theorem for matrix-valued radial basis functions , 2005, Numerical Algorithms.

[45]  Michael I. Jordan,et al.  Sparse Gaussian Process Classification With Multiple Classes , 2004 .

[46]  Charles A. Micchelli,et al.  Kernels for Multi--task Learning , 2004, NIPS.

[47]  Multivariate Geostatistics , 2004 .

[48]  T. Poggio,et al.  Networks and the best approximation property , 1990, Biological Cybernetics.

[49]  Lorenzo Rosasco,et al.  Some Properties of Regularized Kernel Methods , 2004, J. Mach. Learn. Res..

[50]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[51]  Eisaku Maeda,et al.  Maximal Margin Labeling for Multi-Topic Text Categorization , 2004, NIPS.

[52]  Ronald P. Barry,et al.  Flexible Spatial Models for Kriging and Cokriging Using Moving Averages and the Fast Fourier Transform (FFT) , 2004 .

[53]  Marcus R. Frean,et al.  Dependent Gaussian Processes , 2004, NIPS.

[54]  Barak A. Pearlmutter,et al.  Transformations of Gaussian Process Priors , 2004, Deterministic and Statistical Methods in Machine Learning.

[55]  David Higdon,et al.  A process-convolution approach to modelling temperatures in the North Atlantic Ocean , 1998, Environmental and Ecological Statistics.

[56]  L. M. Berliner,et al.  Hierarchical Bayesian space-time models , 1998, Environmental and Ecological Statistics.

[57]  B. Pelletier,et al.  Fitting the Linear Model of Coregionalization by Generalized Least Squares , 2004 .

[58]  C. F. Sirmans,et al.  Nonstationary multivariate process modeling through spatially varying coregionalization , 2004 .

[59]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[60]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[61]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[62]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[63]  Roderick Murray-Smith,et al.  Learning with large data sets using filtered {G}aussian Process priors , 2005 .

[64]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[65]  C. Carmeli,et al.  VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES OF INTEGRABLE FUNCTIONS AND MERCER THEOREM , 2006 .

[66]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[67]  A. OHagan,et al.  Bayesian analysis of computer code outputs: A tutorial , 2006, Reliab. Eng. Syst. Saf..

[68]  Neil D. Lawrence,et al.  Modelling transcriptional regulation using Gaussian Processes , 2006, NIPS.

[69]  E. Fuselier Refined error estimates for matrix-valued radial basis functions , 2007 .

[70]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[71]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[72]  Noel A Cressie,et al.  Some topics in convolution-based spatial modeling , 2007 .

[73]  Hao Zhang,et al.  Maximum‐likelihood estimation for multivariate spatial linear coregionalization models , 2007 .

[74]  S. Mukherjee,et al.  Nonparametric Bayesian Kernel Models , 2007 .

[75]  Stephen J. Roberts,et al.  Gaussian Processes for Prediction , 2007 .

[76]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[77]  Alan E. Gelfand,et al.  Multivariate Spatial Modeling for Geostatistical Data Using Convolved Covariance Functions , 2007 .

[78]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[79]  Sayan Mukherjee,et al.  Characterizing the Function Space for Bayesian Kernel Models , 2007, J. Mach. Learn. Res..

[80]  Phillip Boyle,et al.  Gaussian Processes for Regression and Optimisation , 2007 .

[81]  D. Higdon,et al.  Computer Model Calibration Using High-Dimensional Output , 2008 .

[82]  Peter Z. G. Qian,et al.  Gaussian Process Models for Computer Experiments With Qualitative and Quantitative Factors , 2008, Technometrics.

[83]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[84]  Massimiliano Pontil,et al.  An Algorithm for Transfer Learning in a Heterogeneous Environment , 2008, ECML/PKDD.

[85]  Catherine A. Calder,et al.  A dynamic process convolution approach to modeling ambient particulate matter concentrations , 2008 .

[86]  Neil D. Lawrence,et al.  Sparse Convolved Gaussian Processes for Multi-output Regression , 2008, NIPS.

[87]  Charles A. Micchelli,et al.  Universal Multi-Task Kernels , 2008, J. Mach. Learn. Res..

[88]  J. Rougier Efficient Emulators for Multivariate Deterministic Functions , 2008 .

[89]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[90]  Sarvapali D. Ramchurn,et al.  2008 International Conference on Information Processing in Sensor Networks Towards Real-Time Information Processing of Sensor Network Data using Computationally Efficient Multi-output Gaussian Processes , 2022 .

[91]  Daniel Sheldon,et al.  Graphical Multi-Task Learning , 2008 .

[92]  Sethu Vijayakumar,et al.  Multi-task Gaussian Process Learning of Robot Inverse Dynamics , 2008, NIPS.

[93]  Vicente J. Romero,et al.  Calibration and Uncertainty Analysis for Computer Simulations with Multivariate Output , 2008 .

[94]  Neil D. Lawrence,et al.  Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities , 2008, ECCB.

[95]  Neil D. Lawrence,et al.  Efficient Sampling for Gaussian Process Inference using Control Variables , 2008, NIPS.

[96]  Kian Ming Adam Chai Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes , 2009, NIPS.

[97]  M. J. Bayarri,et al.  Predicting Vehicle Crashworthiness: Validation of Computer Models for Functional and Hierarchical Data , 2009 .

[98]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[99]  Neil D. Lawrence,et al.  Latent Force Models , 2009, AISTATS.

[100]  A. O'Hagan,et al.  Bayesian emulation of complex multi-output and dynamic computer models , 2010 .

[101]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[102]  Scott Sanner,et al.  Gaussian Process Preference Elicitation , 2010, NIPS.

[103]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[104]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[105]  Weifeng Liu,et al.  Adaptive and Learning Systems for Signal Processing, Communication, and Control , 2010 .

[106]  Mauricio A. Álvarez Convolved Gaussian process priors for multivariate regression with applications to dynamical systems , 2011 .

[107]  Guido Sanguinetti,et al.  Bayesian Multitask Classification With Gaussian Process Priors , 2011, IEEE Transactions on Neural Networks.

[108]  K. Borgwardt Learning sparse inverse covariance matrices in the presence of confounders , 2011, NIPS 2011.

[109]  Jeremy E. Oakley,et al.  Probabilistic uncertainty analysis of an FRF of a structure using a Gaussian process emulator , 2011 .

[110]  Lorenzo Rosasco,et al.  Multi-output learning via spectral filtering , 2012, Machine Learning.

[111]  J. T. Spooner,et al.  Adaptive and Learning Systems for Signal Processing , Communications , and Control , 2013 .