On reducing sampling variance in covariate shift using control variates

Covariate shift classification problems can in principle be tackled by importance-weighting training samples. However, the sampling variance of the risk estimator is often scaled up dramatically by the weights. This means that during cross-validation - when the importance-weighted risk is repeatedly evaluated - suboptimal hyperparameter estimates are produced. We study the sampling variances of the importance-weighted versus the oracle estimator as a function of the relative scale of the training data. We show that introducing a control variate can reduce the variance of the importance-weighted risk estimator, which leads to superior regularization parameter estimates when the training data is much smaller in scale than the test data.

[1]  Barry L. Nelson,et al.  On control variate estimators , 1987, Comput. Oper. Res..

[2]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[3]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[4]  Jon C. Helton,et al.  Survey of sampling-based methods for uncertainty and sensitivity analysis , 2006, Reliab. Eng. Syst. Saf..

[5]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[6]  George Hripcsak,et al.  Analysis of Variance of Cross-Validation Estimators of the Generalization Error , 2005, J. Mach. Learn. Res..

[7]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[8]  A. Winsor Sampling techniques. , 2000, Nursing times.

[9]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[10]  Klaus-Robert Müller,et al.  Model Selection Under Covariate Shift , 2005, ICANN.

[11]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[12]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[13]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[14]  J. G. Mauldon,et al.  General principles of antithetic variates , 1956, Mathematical Proceedings of the Cambridge Philosophical Society.

[15]  H. Kahn,et al.  Methods of Reducing Sample Size in Monte Carlo Computations , 1953, Oper. Res..

[16]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[17]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[18]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[19]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[20]  Wouter M. Kouw,et al.  On regularization parameter estimation under covariate shift , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[21]  J. Heckman Sample Selection Bias as a Specification Error (with an Application to the Estimation of Labor Supply Functions) , 1977 .

[22]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[23]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[24]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[25]  A. Owen,et al.  Safe and Effective Importance Sampling , 2000 .

[26]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[27]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.