Analysis of Kernel Mean Matching under Covariate Shift

In real supervised learning scenarios, it is not uncommon that the training and test sample follow different probability distributions, thus rendering the necessity to correct the sampling bias. Focusing on a particular covariate shift problem, we derive high probability confidence bounds for the kernel mean matching (KMM) estimator, whose convergence rate turns out to depend on some regularity measure of the regression function and also on some capacity measure of the kernel. By comparing KMM with the natural plug-in estimator, we establish the superiority of the former hence provide concrete evidence/ understanding to the effectiveness of KMM under covariate shift.

[1]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[2]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[3]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[5]  I. Pinelis OPTIMUM BOUNDS FOR THE DISTRIBUTIONS OF MARTINGALES IN BANACH SPACES , 1994, 1208.2200.

[6]  J. Heckman Sample selection bias as a specification error , 1979 .

[7]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[8]  Takafumi Kanamori,et al.  Statistical analysis of kernel-based least-squares density-ratio estimation , 2012, Machine Learning.

[9]  Qiang Wu,et al.  A note on application of integral operator in learning theory , 2009 .

[10]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[11]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[12]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[13]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[14]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[15]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint: Index , 2007 .

[16]  B. Schölkopf,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[17]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[18]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[19]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[20]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics) , 2007 .

[21]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[22]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.