论文信息 - Adaptive Risk Minimization: Learning to Adapt to Domain Shift

Adaptive Risk Minimization: Learning to Adapt to Domain Shift

A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution. However, this assumption is violated in almost all practical applications: machine learning systems are regularly tested under distribution shift, due to changing temporal correlations, atypical end users, or other factors. In this work, we consider the problem setting of domain generalization, where the training data are structured into domains and there may be multiple test time shifts, corresponding to new domains or domain distributions. Most prior methods aim to learn a single robust model or invariant feature space that performs well on all domains. In contrast, we aim to learn models that adapt at test time to domain shift using unlabeled test points. Our primary contribution is to introduce the framework of adaptive risk minimization (ARM), in which models are directly optimized for effective adaptation to shift by learning to adapt on the training domains. Compared to prior methods for robustness, invariance, and adaptation, ARM methods provide performance gains of 1-4% test accuracy on a number of image classification problems exhibiting domain shift.

[1] D. Lazer,et al. The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[2] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.

[3] Jason Yosinski,et al. R X R X 1: A N IMAGE SET FOR CELLULAR MORPHOLOGICAL VARIATION ACROSS MANY EXPERIMENTAL BATCHES , 2019 .

[4] Amos Storkey,et al. Learning to learn via Self-Critique , 2019, ArXiv.

[5] L. Kaelbling,et al. Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time , 2020, NeurIPS.

[6] Gang Niu,et al. Does Distributionally Robust Supervised Learning Give Robust Classifiers? , 2016, ICML.

[7] Matthias Bethge,et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[8] Ye Xu,et al. Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias , 2013, 2013 IEEE International Conference on Computer Vision.

[9] Matthias Bethge,et al. Improving robustness against common corruptions by covariate shift adaptation , 2020, NeurIPS.

[10] Trevor Darrell,et al. Tent: Fully Test-Time Adaptation by Entropy Minimization , 2021, ICLR.

[11] Shaoqun Zeng,et al. From Detection of Individual Metastases to Classification of Lymph Node Status at the Patient Level: The CAMELYON17 Challenge , 2019, IEEE Transactions on Medical Imaging.

[12] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[13] Jiaying Liu,et al. Revisiting Batch Normalization For Practical Domain Adaptation , 2016, ICLR.

[14] Victor S. Lempitsky,et al. Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[15] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[16] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[17] Christoph H. Lampert,et al. Classifier adaptation at prediction time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Bernt Schiele,et al. Learning to Self-Train for Semi-Supervised Few-Shot Classification , 2019, NeurIPS.

[20] Bo Wang,et al. Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Shiliang Sun,et al. A survey of multi-source domain adaptation , 2015, Inf. Fusion.

[22] Gabriela Csurka,et al. Domain Adaptation for Visual Applications: A Comprehensive Survey , 2017, ArXiv.

[23] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Sergey Levine,et al. Unsupervised Learning via Meta-Learning , 2018, ICLR.

[25] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[26] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[27] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[28] Amos J. Storkey,et al. Towards a Neural Statistician , 2016, ICLR.

[29] Pang Wei Koh,et al. WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[30] Sara Beery,et al. The iWildCam 2020 Competition Dataset , 2020, ArXiv.

[31] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[32] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[33] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[34] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[35] Fabio Maria Carlucci,et al. AutoDIAL: Automatic Domain Alignment Layers , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).