Covariate Shift in Hilbert Space: A Solution via Sorrogate Kernels

Covariate shift is an unconventional learning scenario in which training and testing data have different distributions. A general principle to solve the problem is to make the training data distribution similar to that of the test domain, such that classifiers computed on the former generalize well to the latter. Current approaches typically target on sample distributions in the input space, however, for kernel-based learning methods, the algorithm performance depends directly on the geometry of the kernel-induced feature space. Motivated by this, we propose to match data distributions in the Hilbert space, which, given a pre-defined empirical kernel map, can be formulated as aligning kernel matrices across domains. In particular, to evaluate similarity of kernel matrices defined on arbitrarily different samples, the novel concept of surrogate kernel is introduced based on the Mercer's theorem. Our approach caters the model adaptation specifically to kernel-based learning mechanism, and demonstrates promising results on several real-world applications.

[1]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[2]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[3]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[4]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[5]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[6]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[7]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[8]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  Nello Cristianini,et al.  On the eigenspectrum of the gram matrix and the generalization error of kernel-PCA , 2005, IEEE Transactions on Information Theory.

[10]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[11]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[12]  Steffen Bickel,et al.  Transfer Learning by Distribution Matching for Targeted Advertising , 2008, NIPS.

[13]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[14]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[15]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[16]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[17]  Yaoliang Yu,et al.  Analysis of Kernel Mean Matching under Covariate Shift , 2012, ICML.

[18]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[19]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[20]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .