论文信息 - A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms - 字舞流文

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

We propose to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional changes, e.g. due to interventions, actions of agents and other sources of non-stationarities. We show that under this assumption, the correct causal structural choices lead to faster adaptation to modified distributions because the changes are concentrated in one or just a few mechanisms when the learned knowledge is modularized appropriately. This leads to sparse expected gradients and a lower effective number of degrees of freedom needing to be relearned while adapting to the change. It motivates using the speed of adaptation to a modified distribution as a meta-learning objective. We demonstrate how this can be used to determine the cause-effect relationship between two observed variables. The distributional changes do not need to correspond to standard interventions (clamping a variable), and the learner has no direct knowledge of these interventions. We show that causal structures can be parameterized via continuous variables and learned end-to-end. We then explore how these ideas could be used to also learn an encoder that would map low-level observed variables to unobserved causal variables leading to faster adaptation out-of-distribution, learning a representation space where one can satisfy the assumptions of independent mechanisms and of small and sparse changes in these mechanisms due to actions and non-stationarities.

Christopher Joseph Pal | Yoshua Bengio | Tristan Deleu | Anirudh Goyal | Olexa Bilaniuk | Nan Rosemary Ke | Sébastien Lachapelle | Nasim Rahaman | Yoshua Bengio | Anirudh Goyal | C. Pal | O. Bilaniuk | T. Deleu | Nasim Rahaman | Sébastien Lachapelle

[1] D. Blackwell. Conditional Expectation and Unbiased Sequential Estimation , 1947 .

[2] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3] A. P. Dawid,et al. Present position and potential developments: some personal views , 1984 .

[4] Leslie G. Valiant,et al. A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[5] Judea Pearl,et al. Equivalence and Synthesis of Causal Models , 1990, UAI.

[6] C. R. Rao,et al. Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[7] Wai Lam,et al. Using Causal Information and Local Measures to Learn Bayesian Networks , 1993, UAI.

[8] David Heckerman,et al. Learning Gaussian Networks , 1994, UAI.

[9] S. Srihari. Mixture Density Networks , 1994 .

[10] J. Pearl. Causal diagrams for empirical research , 1995 .

[11] Nir Friedman,et al. Learning Bayesian Networks with Local Structure , 1996, UAI.

[12] Jin Tian,et al. Causal Discovery from Changes , 2001, UAI.

[13] Erkki Oja,et al. Independent Component Analysis , 2001 .

[14] David Maxwell Chickering,et al. Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[15] David Maxwell Chickering,et al. Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[16] David Maxwell Chickering,et al. Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[17] Mark W. Schmidt,et al. Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[18] Neil D. Lawrence,et al. Dataset Shift in Machine Learning , 2009 .

[19] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .

[20] Qiang Shen,et al. Learning Bayesian networks: approaches and issues , 2011, The Knowledge Engineering Review.

[21] Peter Bühlmann,et al. Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs , 2011, J. Mach. Learn. Res..

[22] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .

[24] Pietro Perona,et al. Visual Causal Feature Learning , 2014, UAI.

[25] Jonas Peters,et al. Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[26] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[27] Uri Shalit,et al. Learning Representations for Counterfactual Inference , 2016, ICML.

[28] Elias Bareinboim,et al. Causal inference and the data-fusion problem , 2016, Proceedings of the National Academy of Sciences.

[29] Uri Shalit,et al. Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[30] Yoshua Bengio. The Consciousness Prior , 2017, ArXiv.

[31] Bernhard Schölkopf,et al. Causal Discovery from Nonstationary/Heterogeneous Data: Skeleton Estimation and Orientation Determination , 2017, IJCAI.

[32] Pietro Perona,et al. Causal feature learning: an overview , 2017 .

[33] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[34] Bernhard Schölkopf,et al. Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[35] Leslie Pack Kaelbling,et al. Modular meta-learning , 2018, CoRL.

[36] Bernhard Schölkopf,et al. Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[37] Sergey Levine,et al. Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[38] Joris M. Mooij,et al. Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[39] Yoshua Bengio,et al. Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[40] Pradeep Ravikumar,et al. DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[41] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[42] Bernhard Schölkopf,et al. Learning Independent Causal Mechanisms , 2017, ICML.

[43] Chelsea Finn,et al. Learning to Learn with Gradients , 2018 .

[44] Nan Rosemary Ke,et al. Learning Neural Causal Models from Unknown Interventions , 2019, ArXiv.

[45] Aapo Hyvärinen,et al. Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning , 2018, AISTATS.

[46] Zeb Kurth-Nelson,et al. Causal Reasoning from Meta-reinforcement Learning , 2019, ArXiv.

[47] Bernhard Schölkopf,et al. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.