Distributionally Robust Learning for Unsupervised Domain Adaptation

We propose a distributionally robust learning (DRL) method for unsupervised domain adaptation (UDA) that scales to modern computer vision benchmarks. DRL can be naturally formulated as a competitive two-player game between a predictor and an adversary that is allowed to corrupt the labels, subject to certain constraints, and reduces to incorporating a density ratio between the source and target domains (under the standard log loss). This formulation motivates the use of two neural networks that are jointly trained - a discriminative network between the source and target domains for density-ratio estimation, in addition to the standard classification network. The use of a density ratio in DRL prevents the model from being overconfident on target inputs far away from the source domain. Thus, DRL provides conservative confidence estimation in the target domain, even when the target labels are not available. This conservatism motivates the use of DRL in self-training for sample selection, and we term the approach distributionally robust self-training (DRST). In our experiments, DRST generates more calibrated probabilities and achieves state-of-the-art self-training accuracy on benchmark datasets. We demonstrate that DRST captures shape features more effectively, and reduces the extent of distributional shift during self-training.

[1]  Sunita Sarawagi,et al.  Trainable Calibration Measures For Neural Networks From Kernel Mean Embeddings , 2018, ICML.

[2]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[3]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[4]  Brian D. Ziebart,et al.  Robust Covariate Shift Regression , 2016, AISTATS.

[5]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yaniv Romano,et al.  Conformalized Quantile Regression , 2019, NeurIPS.

[7]  Kate Saenko,et al.  VisDA: The Visual Domain Adaptation Challenge , 2017, ArXiv.

[8]  Tengyu Ma,et al.  Understanding Self-Training for Gradual Domain Adaptation , 2020, ICML.

[9]  Xiaofeng Liu,et al.  Confidence Regularized Self-Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[11]  Soon-Jo Chung,et al.  Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems , 2020, IEEE Robotics Autom. Lett..

[12]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[13]  Tatsuya Harada,et al.  Maximum Classifier Discrepancy for Unsupervised Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Alex Kendall,et al.  Concrete Dropout , 2017, NIPS.

[16]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[17]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[18]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[19]  François Laviolette,et al.  Domain-Adversarial Neural Networks , 2014, ArXiv.

[20]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[21]  Anima Anandkumar,et al.  Automated Synthetic-to-Real Generalization , 2020, ICML.

[22]  Yang Zou,et al.  Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training , 2018, ArXiv.

[23]  Soon-Jo Chung,et al.  Robust Regression for Safe Exploration in Control , 2019, L4DC.

[24]  George V. Moustakides,et al.  Training Neural Networks for Likelihood/Density Ratio Estimation , 2019, ArXiv.

[25]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[26]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[27]  Yifan Wu,et al.  Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment , 2019, ICML.

[28]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[29]  Insup Lee,et al.  Calibrated Prediction with Covariate Shift via Unsupervised Domain Adaptation , 2020, AISTATS.

[30]  Philip S. Yu,et al.  Visual Domain Adaptation with Manifold Embedded Distribution Alignment , 2018, ACM Multimedia.

[31]  Brian C. Lovell,et al.  Unsupervised Domain Adaptation by Domain Invariant Projection , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[33]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[34]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[35]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[36]  Carlos D. Castillo,et al.  Generate to Adapt: Aligning Domains Using Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Tinne Tuytelaars,et al.  Subspace Alignment For Domain Adaptation , 2014, ArXiv.

[38]  Colin Wei,et al.  Self-training Avoids Using Spurious Features Under Domain Shift , 2020, NeurIPS.

[39]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[40]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[41]  David M. Blei,et al.  Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..

[42]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[43]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[44]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[45]  Xinhua Zhang,et al.  Consistent Robust Adversarial Prediction for General Multiclass Classification , 2018, ArXiv.

[46]  Kamyar Azizzadenesheli,et al.  Regularized Learning for Domain Adaptation under Label Shifts , 2019, ICLR.

[47]  Bulent Yener,et al.  Deep density ratio estimation for change point detection , 2019, ArXiv.

[48]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[49]  Anima Anandkumar,et al.  Angular Visual Hardness , 2019, ICML.

[50]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[51]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[52]  Brian D. Ziebart,et al.  Adversarial Multiclass Classification: A Risk Minimization Perspective , 2016, NIPS.

[53]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[54]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[55]  Brian D. Ziebart,et al.  Robust Covariate Shift Prediction with General Losses and Feature Views , 2017, ArXiv.

[56]  Michael I. Jordan,et al.  Conditional Adversarial Domain Adaptation , 2017, NeurIPS.

[57]  Stefano Ermon,et al.  Accurate Uncertainties for Deep Learning Using Calibrated Regression , 2018, ICML.

[58]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Brian D. Ziebart,et al.  Robust Classification Under Sample Selection Bias , 2014, NIPS.