Invariant Risk Minimization Games

The standard risk minimization paradigm of machine learning is brittle when operating in environments whose test distributions are different from the training distribution due to spurious correlations. Training on data from many environments and finding invariant predictors reduces the effect of spurious features by concentrating models on features that have a causal relationship with the outcome. In this work, we pose such invariant risk minimization as finding the Nash equilibrium of an ensemble game among several environments. By doing so, we develop a simple training algorithm that uses best response dynamics and, in our experiments, yields similar or better empirical accuracy with much lower variance than the challenging bi-level optimization problem of Arjovsky et al. (2019). One key theoretical contribution is showing that the set of Nash equilibria for the proposed game are equivalent to the set of invariant predictors for any finite number of environments, even with nonlinear classifiers and transformations. As a result, our method also retains the generalization guarantees to a large set of environments shown in Arjovsky et al. (2019). The proposed algorithm adds to the collection of successful game-theoretic machine learning algorithms such as generative adversarial networks.

[1]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[2]  I. Glicksberg A FURTHER GENERALIZATION OF THE KAKUTANI FIXED POINT THEOREM, WITH APPLICATION TO NASH EQUILIBRIUM POINTS , 1952 .

[3]  Gerard Debreu,et al.  A Social Equilibrium Existence Theorem* , 1952, Proceedings of the National Academy of Sciences.

[4]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[6]  J. Pearl Causal diagrams for empirical research , 1995 .

[7]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[8]  R. Ash,et al.  Probability and measure theory , 1999 .

[9]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[10]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[11]  J. Hofbauer,et al.  BEST RESPONSE DYNAMICS FOR CONTINUOUS ZERO{SUM GAMES , 2005 .

[12]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[13]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[14]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[15]  D. Garling,et al.  Inequalities: A Journey into Linear Analysis , 2007 .

[16]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[17]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[18]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[19]  E. Barron,et al.  Best response dynamics for continuous games , 2010 .

[20]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[21]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[22]  Elias Bareinboim,et al.  Local Characterizations of Causal Bayesian Networks , 2011, GKR.

[23]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[24]  W. Marsden I and J , 2012 .

[25]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  François Laviolette,et al.  Domain-Adversarial Neural Networks , 2014, ArXiv.

[28]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[29]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[30]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[31]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[32]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[33]  P. Jean-Jacques Herings,et al.  Best-Response Cycles in Perfect Information Games , 2017, Math. Oper. Res..

[34]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[35]  Mehryar Mohri,et al.  Algorithms and Theory for Multiple-Source Adaptation , 2018, NeurIPS.

[36]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[37]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[38]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[39]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[40]  Jaeho Lee,et al.  Minimax Statistical Learning with Wasserstein distances , 2017, NeurIPS.

[41]  Volkan Cevher,et al.  Finding Mixed Nash Equilibria of Generative Adversarial Networks , 2018, ICML.

[42]  S. Saria,et al.  Should I Include this Edge in my Prediction? Analyzing the Stability-Performance Tradeoff , 2019 .

[43]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[44]  Rajesh Ranganath,et al.  Support and Invertibility in Domain-Invariant Representations , 2019, AISTATS.

[45]  Kun Zhang,et al.  On Learning Invariant Representation for Domain Adaptation , 2019, ArXiv.

[46]  Sergey Levine,et al.  Causal Confusion in Imitation Learning , 2019, NeurIPS.

[47]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[48]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[49]  Mahdi Milani Fard,et al.  Metric-Optimized Example Weights , 2018, ICML.

[50]  W. Hager,et al.  and s , 2019, Shallow Water Hydraulics.

[51]  Zhengyuan Zhou,et al.  Learning in games with continuous action sets and unknown payoff functions , 2019, Math. Program..

[52]  P. Alam ‘T’ , 2021, Composites Engineering: An A–Z Guide.

[53]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[54]  John Duchi,et al.  Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach , 2016, Math. Oper. Res..