Online Adaptation to Label Distribution Shift

Machine learning models often encounter distribution shifts when deployed in the real world. In this paper, we focus on adaptation to label distribution shift in the online setting, where the test-time label distribution is continually changing and the model must dynamically adapt to it without observing the true label. Leveraging a novel analysis, we show that the lack of true label does not hinder estimation of the expected test loss, which enables the reduction of online label shift adaptation to conventional online learning. Informed by this observation, we propose adaptation algorithms inspired by classical online learning techniques such as Follow The Leader (FTL) and Online Gradient Descent (OGD) and derive their regret bounds. We empirically verify our findings under both simulated and real world label distribution shifts and show that OGD is particularly effective and robust to a variety of challenging label shift scenarios.

[1]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[2]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[3]  Kamyar Azizzadenesheli,et al.  Regularized Learning for Domain Adaptation under Label Shifts , 2019, ICLR.

[4]  Sivaraman Balakrishnan,et al.  A Unified View of Label Shift Estimation , 2020, NeurIPS.

[5]  H. Brendan McMahan,et al.  A survey of Algorithms and Analysis for Adaptive Online Learning , 2014, J. Mach. Learn. Res..

[6]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[7]  Alexei A. Efros,et al.  Test-Time Training with Self-Supervision for Generalization under Distribution Shifts , 2019, ICML.

[8]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[9]  Deva Ramanan,et al.  Online Model Distillation for Efficient Video Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Nicolas Courty,et al.  DeepJDOT: Deep Joint distribution optimal transport for unsupervised domain adaptation , 2018, ECCV.

[11]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[12]  Trevor Darrell,et al.  Continuous Manifold Based Adaptation for Evolving Visual Domains , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[15]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[16]  Jian Shen,et al.  Wasserstein Distance Guided Representation Learning for Domain Adaptation , 2017, AAAI.

[17]  Geoffrey J. Gordon,et al.  Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift , 2020, NeurIPS.

[18]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[19]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[20]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[21]  Erik G. Learned-Miller,et al.  Online domain adaptation of a pre-trained cascade of classifiers , 2011, CVPR 2011.

[22]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[23]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[25]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[26]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[27]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[28]  Mehryar Mohri,et al.  Algorithms and Theory for Multiple-Source Adaptation , 2018, NeurIPS.

[29]  Yi Yang,et al.  Contrastive Adaptation Network for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[31]  Karsten Ahnert,et al.  Numerical differentiation of experimental data: local versus global methods , 2007, Comput. Phys. Commun..

[32]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[33]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[34]  Pascal Fua,et al.  Beyond Sharing Weights for Deep Domain Adaptation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[37]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[38]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[39]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[40]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[41]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[42]  Avanti Shrikumar,et al.  Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation , 2020, ICML.

[43]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.