Neural Networks for Learning Counterfactual G-Invariances from Single Environments

Despite —or maybe because of— their astonishing capacity to fit data, neural networks are believed to have difficulties extrapolating beyond training data distribution. This work shows that, for extrapolations based on finite transformation groups, a model’s inability to extrapolate is unrelated to its capacity. Rather, the shortcoming is inherited from a learning hypothesis: Examples not explicitly observed with infinitely many training examples have underspecified outcomes in the learner’s model. In order to endow neural networks with the ability to extrapolate over group transformations, we introduce a learning framework counterfactually-guided by the learning hypothesis that any group invariance to (known) transformation groups is mandatory even without evidence, unless the learner deems it inconsistent with the training data. Unlike existing invariance-driven methods for (counterfactual) extrapolations, this framework allows extrapolations from a single environment. Finally, we introduce sequence and image extrapolation tasks that validate our framework and showcase the shortcomings of traditional approaches.

[1]  Judea Pearl,et al.  What Counterfactuals Can Be Tested , 2007, UAI.

[2]  M. Sidman,et al.  Conditional discrimination vs. matching to sample: an expansion of the testing paradigm. , 1982, Journal of the experimental analysis of behavior.

[3]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[4]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[5]  Alexander D'Amour,et al.  Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[6]  Patrick Haffner,et al.  Escaping the Convex Hull with Extrapolated Vector Machines , 2001, NIPS.

[7]  Judea Pearl,et al.  Counterfactual Probabilities: Computational Methods, Bounds and Applications , 1994, UAI.

[8]  R. Thomas McCoy,et al.  BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance , 2020, BLACKBOXNLP.

[9]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[10]  Mark van der Wilk,et al.  On the Benefits of Invariance in Neural Networks , 2020, ArXiv.

[11]  Ken-ichi Kawarabayashi,et al.  How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks , 2020, ICLR.

[12]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[13]  Edgar Dobriban,et al.  A Group-Theoretic Framework for Data Augmentation , 2019, NeurIPS.

[14]  M Sidman,et al.  A search for symmetry in the conditional discriminations of rhesus monkeys, baboons, and children. , 1982, Journal of the experimental analysis of behavior.

[15]  Herke van Hoof,et al.  MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning , 2020, NeurIPS.

[16]  Sergey Levine,et al.  Causal Confusion in Imitation Learning , 2019, NeurIPS.

[17]  Maurice Weiler,et al.  A General Theory of Equivariant CNNs on Homogeneous Spaces , 2018, NeurIPS.

[18]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[19]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[20]  Bernhard Schölkopf,et al.  Group invariance principles for causal generative models , 2017, AISTATS.

[21]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[22]  Chelsea Finn,et al.  Meta-Learning Symmetries by Reparameterization , 2020, ICLR.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Lorenzo Rosasco,et al.  Symmetry-adapted representation learning , 2019, Pattern Recognit..

[25]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[26]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[27]  J. Gomez,et al.  Production and perception rules underlying visual patterns: effects of symmetry and hierarchy , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[28]  David Bruce Wilson,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996, Random Struct. Algorithms.

[29]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[30]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[31]  E. Bareinboim,et al.  On Pearl’s Hierarchy and the Foundations of Causal Inference , 2022 .

[32]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33]  J. Pitman On coupling of Markov chains , 1976 .

[34]  Vinayak A. Rao,et al.  Relational Pooling for Graph Representations , 2019, ICML.

[35]  I. Guyon,et al.  Causal Generative Neural Networks , 2017, 1711.08936.

[36]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[37]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[38]  Ryan L. Murphy,et al.  Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs , 2018, ICLR.

[39]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[40]  Pradeep Ravikumar,et al.  The Risks of Invariant Risk Minimization , 2020, ICLR.

[41]  James Hensman,et al.  Learning Invariances using the Marginal Likelihood , 2018, NeurIPS.

[42]  Richard Zemel,et al.  Exchanging Lessons Between Algorithmic Fairness and Domain Generalization , 2020, ArXiv.

[43]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[44]  Bernhard Scholkopf Causality for Machine Learning , 2019 .

[45]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[48]  Jin Tian,et al.  Causal Discovery from Changes , 2001, UAI.