A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

We propose to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional changes, e.g. due to interventions, actions of agents and other sources of non-stationarities. We show that under this assumption, the correct causal structural choices lead to faster adaptation to modified distributions because the changes are concentrated in one or just a few mechanisms when the learned knowledge is modularized appropriately. This leads to sparse expected gradients and a lower effective number of degrees of freedom needing to be relearned while adapting to the change. It motivates using the speed of adaptation to a modified distribution as a meta-learning objective. We demonstrate how this can be used to determine the cause-effect relationship between two observed variables. The distributional changes do not need to correspond to standard interventions (clamping a variable), and the learner has no direct knowledge of these interventions. We show that causal structures can be parameterized via continuous variables and learned end-to-end. We then explore how these ideas could be used to also learn an encoder that would map low-level observed variables to unobserved causal variables leading to faster adaptation out-of-distribution, learning a representation space where one can satisfy the assumptions of independent mechanisms and of small and sparse changes in these mechanisms due to actions and non-stationarities.

[1]  D. Blackwell Conditional Expectation and Unbiased Sequential Estimation , 1947 .

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[4]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[5]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[6]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[7]  Wai Lam,et al.  Using Causal Information and Local Measures to Learn Bayesian Networks , 1993, UAI.

[8]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[9]  S. Srihari Mixture Density Networks , 1994 .

[10]  J. Pearl Causal diagrams for empirical research , 1995 .

[11]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[12]  Jin Tian,et al.  Causal Discovery from Changes , 2001, UAI.

[13]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[14]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[15]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[16]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[17]  Mark W. Schmidt,et al.  Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[18]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[19]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[20]  Qiang Shen,et al.  Learning Bayesian networks: approaches and issues , 2011, The Knowledge Engineering Review.

[21]  Peter Bühlmann,et al.  Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs , 2011, J. Mach. Learn. Res..

[22]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[24]  Pietro Perona,et al.  Visual Causal Feature Learning , 2014, UAI.

[25]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[26]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[27]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[28]  Elias Bareinboim,et al.  Causal inference and the data-fusion problem , 2016, Proceedings of the National Academy of Sciences.

[29]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[30]  Yoshua Bengio The Consciousness Prior , 2017, ArXiv.

[31]  Bernhard Schölkopf,et al.  Causal Discovery from Nonstationary/Heterogeneous Data: Skeleton Estimation and Orientation Determination , 2017, IJCAI.

[32]  Pietro Perona,et al.  Causal feature learning: an overview , 2017 .

[33]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[34]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[35]  Leslie Pack Kaelbling,et al.  Modular meta-learning , 2018, CoRL.

[36]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[37]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[38]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[39]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[40]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[41]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[42]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[43]  Chelsea Finn,et al.  Learning to Learn with Gradients , 2018 .

[44]  Nan Rosemary Ke,et al.  Learning Neural Causal Models from Unknown Interventions , 2019, ArXiv.

[45]  Aapo Hyvärinen,et al.  Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning , 2018, AISTATS.

[46]  Zeb Kurth-Nelson,et al.  Causal Reasoning from Meta-reinforcement Learning , 2019, ArXiv.

[47]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.