An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization

A popular assumption for out-of-distribution generalization is that the training data comprises subdatasets, each drawn from a distinct distribution; the goal is then to “interpolate” these distributions and “extrapolate” beyond them—this objective is broadly known as domain generalization. A common belief is that ERM can interpolate but not extrapolate and that the latter task is considerably more difficult, but these claims are vague and lack formal justification. In this work, we recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test distributions. Under an existing notion of interand extrapolation based on reweighting of sub-group likelihoods, we rigorously demonstrate that extrapolation is computationally much harder than interpolation, though their statistical complexity is not significantly different. Furthermore, we show that ERM—or a noisy variant—is provably minimax-optimal for both tasks. Our framework presents a new avenue for the formal analysis of domain generalization algorithms which may be of independent interest.

[1]  Sanjay Mehrotra,et al.  Distributionally Robust Optimization: A Review , 2019, ArXiv.

[2]  John Duchi,et al.  Distributionally Robust Losses for Latent Covariate Mixtures , 2020, ArXiv.

[3]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[4]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[5]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[6]  Han Zhao,et al.  On Learning Invariant Representations for Domain Adaptation , 2019, ICML.

[7]  Jonas Peters,et al.  A Causal Framework for Distribution Generalization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Christina Heinze-Deml,et al.  Conditional variance penalties and domain shift robustness , 2017, Machine Learning.

[9]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[10]  Gilles Blanchard,et al.  Generalizing from Several Related Classification Tasks to a New Unlabeled Sample , 2011, NIPS.

[11]  Santosh S. Vempala,et al.  Efficient Representations for Lifelong Learning and Autoencoding , 2014, COLT.

[12]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[13]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[14]  MarchandMario,et al.  Domain-adversarial training of neural networks , 2016 .

[15]  Pierre Alquier,et al.  Regret Bounds for Lifelong Learning , 2016, AISTATS.

[16]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[17]  Percy Liang,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[18]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[19]  Jin Tian,et al.  Causal Discovery from Changes , 2001, UAI.

[20]  A. Philip Dawid,et al.  Direct and Indirect Effects of Sequential Treatments , 2006, UAI.

[21]  Gang Niu,et al.  Does Distributionally Robust Supervised Learning Give Robust Classifiers? , 2016, ICML.

[22]  Pradeep Ravikumar,et al.  The Risks of Invariant Risk Minimization , 2020, ICLR.

[23]  J. Håstad Clique is hard to approximate withinn1−ε , 1999 .

[24]  T. Motzkin,et al.  Maxima for Graphs and a New Proof of a Theorem of Turán , 1965, Canadian Journal of Mathematics.

[25]  J. Andrew Bagnell,et al.  Robust Supervised Learning , 2005, AAAI.

[26]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[27]  Etienne de Klerk,et al.  The complexity of optimizing over a simplex, hypercube or sphere: a short survey , 2008, Central Eur. J. Oper. Res..

[28]  Praneeth Netrapalli,et al.  Online Non-Convex Learning: Following the Perturbed Leader is Optimal , 2019, ALT.

[29]  Ioannis Mitliagkas,et al.  Generalizing to unseen domains via distribution matching , 2019 .

[30]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[31]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[32]  Zhitang Chen,et al.  Domain Generalization via Multidomain Discriminant Analysis , 2019, UAI.

[33]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[34]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[35]  Ambuj Tewari,et al.  Optimal Stragies and Minimax Lower Bounds for Online Convex Games , 2008, COLT.

[36]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[37]  Peter L. Bartlett,et al.  Adaptive Online Gradient Descent , 2007, NIPS.

[38]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[39]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.