Classes of Kernels for Machine Learning: A Statistics Perspective

In this paper, we present classes of kernels for machine learning from a statistics perspective. Indeed, kernels are positive definite functions and thus also covariances. After discussing key properties of kernels, as well as a new formula to construct kernels, we present several important classes of kernels: anisotropic stationary kernels, isotropic stationary kernels, compactly supported kernels, locally stationary kernels, nonstationary kernels, and separable nonstationary kernels. Compactly supported kernels and separable nonstationary kernels are of prime interest because they provide a computational reduction for kernel-based methods. We describe the spectral representation of the various classes of kernels and conclude with a discussion on the characterization of nonlinear maps that reduce nonstationary kernels to either stationarity or local stationarity.

[1]  J. Mercer Functions of positive and negative type, and their connection with the theory of integral equations , 1909 .

[2]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[3]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[4]  T. Teichmann,et al.  Harmonic Analysis and the Theory of Probability , 1957, The Mathematical Gazette.

[5]  A. Yaglom Some Classes of Random Fields in n-Dimensional Space, Related to Stationary Random Processes , 1957 .

[6]  Richard A. Silverman,et al.  Locally stationary random processes , 2018, IRE Trans. Inf. Theory.

[7]  R. Payen Fonctions aléatoires du second ordre à valeurs dans un espace de Hilbert , 1967 .

[8]  B. Mandelbrot,et al.  Fractional Brownian Motions, Fractional Noises and Applications , 1968 .

[9]  G. Matheron The intrinsic random functions and their applications , 1973, Advances in Applied Probability.

[10]  Alexander Graham,et al.  Kronecker Products and Matrix Calculus: With Applications , 1981 .

[11]  G. Christakos On the Problem of Permissible Covariance and Variogram Models , 1984 .

[12]  A. Yaglom Correlation Theory of Stationary and Related Random Functions I: Basic Results , 1987 .

[13]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[14]  P. Guttorp,et al.  Nonparametric Estimation of Nonstationary Spatial Covariance Structure , 1992 .

[15]  John R. Gilbert,et al.  Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..

[16]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[17]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[18]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  N. Cressie,et al.  Classes of nonseparable, spatio-temporal stationary covariance functions , 1999 .

[22]  O. Perrin,et al.  Reducing non-stationary stochastic processes to stationarity by a time deformation , 1999 .

[23]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[24]  G. Christakos,et al.  Norm-dependent covariance permissibility of weakly homogeneous spatial random fields and its consequences in spatial statistics , 2000 .

[25]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[26]  O. Perrin,et al.  Reducing non-stationary random fields to stationarity and isotropy using a space deformation , 2000 .

[27]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[28]  George Christakos,et al.  Modern Spatiotemporal Geostatistics , 2000 .

[29]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[30]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[31]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[32]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[33]  T. Gneiting Nonseparable, Stationary Covariance Functions for Space–Time Data , 2002 .

[34]  T. Gneiting Compactly Supported Correlation Functions , 2002 .

[35]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[36]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[37]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[38]  Andrew McCallum,et al.  Semi-Supervised Clustering with User Feedback , 2003 .