Regularized Learning for Domain Adaptation under Label Shifts

We propose Regularized Learning under Label shifts (RLLS), a principled and a practical domain-adaptation algorithm to correct for shifts in the label distribution between a source and a target domain. We first estimate importance weights using labeled source data and unlabeled target data, and then train a classifier on the weighted source samples. We derive a generalization bound for the classifier on the target domain which is independent of the (ambient) data dimensions, and instead only depends on the complexity of the function class. To the best of our knowledge, this is the first generalization bound for the label-shift problem where the labels in the target domain are not available. Based on this bound, we propose a regularized estimator for the small-sample regime which accounts for the uncertainty in the estimated weights. Experiments on the CIFAR-10 and MNIST datasets show that RLLS improves classification accuracy, especially in the low sample and large-shift regimes, compared to previous methods.

[1]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[2]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[3]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[4]  J. Gart,et al.  Comparison of a screening test and a reference test in epidemiologic studies. II. A probabilistic model for the comparison of diagnostic tests. , 1966, American journal of epidemiology.

[5]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[6]  Mehryar Mohri,et al.  Domain adaptation and sample bias correction theory and algorithm for regression , 2014, Theor. Comput. Sci..

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[9]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[10]  Kamyar Azizzadenesheli,et al.  Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.

[11]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[14]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[15]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[16]  Hwee Tou Ng,et al.  Word Sense Disambiguation with Distribution Estimation , 2005, IJCAI.

[17]  John Langford,et al.  Active Learning for Cost-Sensitive Classification , 2017, ICML.

[18]  Sunita Sarawagi,et al.  Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection , 2014, ICML.

[19]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[20]  Koby Crammer,et al.  Learning from Multiple Sources , 2006, NIPS.

[21]  Brian D. Ziebart,et al.  Robust Classification Under Sample Selection Bias , 2014, NIPS.

[22]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[23]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[24]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[25]  Ambuj Tewari,et al.  Mixture Proportion Estimation via Kernel Embeddings of Distributions , 2016, ICML.

[26]  Neil D. Lawrence,et al.  When Training and Test Sets Are Different: Characterizing Learning Transfer , 2009 .

[27]  Brian D. Ziebart,et al.  Robust Covariate Shift Regression , 2016, AISTATS.

[28]  Mikhail Belkin,et al.  Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.

[29]  Amos Storkey,et al.  When Training and Test Sets are Different: Characterising Learning Transfer , 2013 .

[30]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[31]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[32]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[33]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[34]  George Forman,et al.  Quantifying counts and costs via classification , 2008, Data Mining and Knowledge Discovery.

[35]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[36]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[37]  David Lopez-Paz,et al.  Revisiting Classifier Two-Sample Tests , 2016, ICLR.

[38]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[39]  Wojciech Zaremba,et al.  B-test: A Non-parametric, Low Variance Kernel Two-sample Test , 2013, NIPS.

[40]  Csaba Szepesvári,et al.  Statistical linear estimation with penalized estimators: an application to reinforcement learning , 2012, ICML.

[41]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[42]  Clayton Scott,et al.  Class Proportion Estimation with Application to Multiclass Anomaly Rejection , 2013, AISTATS.

[43]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[44]  Akiko Takeda,et al.  Trimmed Density Ratio Estimation , 2017, NIPS.

[45]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.