论文信息 - Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift

Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift

A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution. However, this assumption is violated in almost all practical applications: machine learning systems are regularly tested on data that are structurally different from the training set, either due to temporal correlations, particular end users, or other factors. In this work, we consider the setting where test examples are not drawn from the training distribution. Prior work has approached this problem by attempting to be robust to all possible test time distributions, which may degrade average performance, or by "peeking" at the test examples during training, which is not always feasible. In contrast, we propose to learn models that are adaptable, such that they can adapt to distribution shift at test time using a batch of unlabeled test data points. We acquire such models by learning to adapt to training batches sampled according to different sub-distributions, which simulate structural distribution shifts that may occur at test time. We introduce the problem of adaptive risk minimization (ARM), a formalization of this setting that lends itself to meta-learning methods. Compared to a variety of methods under the paradigms of empirical risk minimization and robust optimization, our approach provides substantial empirical gains on image classification problems in the presence of distribution shift.

[1] Gang Niu,et al. Does Distributionally Robust Supervised Learning Give Robust Classifiers? , 2016, ICML.

[2] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.

[3] Yoshua Bengio,et al. On the Optimization of a Synaptic Learning Rule , 2007 .

[4] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[5] Sergey Levine,et al. Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[6] Alexander J. Smola,et al. Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[7] Marco Pavone,et al. Continuous Meta-Learning without Tasks , 2019, NeurIPS.

[8] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[9] Neil D. Lawrence,et al. Dataset Shift in Machine Learning , 2009 .

[10] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[11] Joshua B. Tenenbaum,et al. Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[12] Qingming Huang,et al. Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks , 2015, ECCV.

[13] Junshan Zhang,et al. A Collaborative Learning Framework via Federated Meta-Learning , 2020, ArXiv.

[14] Anja De Waegenaere,et al. Robust Solutions of Optimization Problems Affected by Uncertain Probabilities , 2011, Manag. Sci..

[15] Victor S. Lempitsky,et al. Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[16] Sreeram Kannan,et al. Improving Federated Learning Personalization via Model Agnostic Meta Learning , 2019, ArXiv.

[17] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[18] Christoph H. Lampert,et al. Classifier adaptation at prediction time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Trevor Darrell,et al. Fully Test-time Adaptation by Entropy Minimization , 2020, ArXiv.

[20] Eunho Yang,et al. Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning , 2018, ICLR.

[21] Dawn Song,et al. Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Fei Chen,et al. Federated Meta-Learning with Fast Convergence and Efficient Communication , 2018 .

[23] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Yee Whye Teh,et al. Conditional Neural Processes , 2018, ICML.

[25] Mehryar Mohri,et al. Sample Selection Bias Correction Theory , 2008, ALT.

[26] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[27] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[28] Michael I. Jordan,et al. Conditional Adversarial Domain Adaptation , 2017, NeurIPS.

[29] Junier B. Oliva,et al. Meta-Neighborhoods , 2019, NeurIPS.

[30] D. Lazer,et al. The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[31] Krishna P. Gummadi,et al. Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[32] Aryan Mokhtari,et al. Personalized Federated Learning: A Meta-Learning Approach , 2020, ArXiv.

[33] Carlos Fernandez-Granda,et al. Be Like Water: Robustness to Extraneous Variables Via Adaptive Feature Normalization , 2020, ArXiv.

[34] Diane J. Cook,et al. A Survey of Unsupervised Deep Domain Adaptation , 2018, ACM Trans. Intell. Syst. Technol..

[35] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.