An Analysis of the Adaptation Speed of Causal Models

We consider the problem of discovering the causal process that generated a collection of datasets. We assume that all these datasets were generated by unknown sparse interventions on a structural causal model (SCM) $G$, that we want to identify. Recently, Bengio et al. (2020) argued that among all SCMs, $G$ is the fastest to adapt from one dataset to another, and proposed a meta-learning criterion to identify the causal direction in a two-variable SCM. While the experiments were promising, the theoretical justification was incomplete. Our contribution is a theoretical investigation of the adaptation speed of simple two-variable SCMs. We use convergence rates from stochastic optimization to justify that a relevant proxy for adaptation speed is distance in parameter space after intervention. Using this proxy, we show that the SCM with the correct causal direction is advantaged for categorical and normal cause-effect datasets when the intervention is on the cause variable. When the intervention is on the effect variable, we provide a more nuanced picture which highlights that the fastest-to-adapt heuristic is not always valid. Code to reproduce experiments is available at this https URL

[1]  Kevin P. Murphy,et al.  Exact Bayesian structure learning from uncertain interventions , 2007, AISTATS.

[2]  David Heckerman,et al.  Parameter Priors for Directed Acyclic Graphical Models and the Characteriration of Several Probability Distributions , 1999, UAI.

[3]  Tianbao Yang,et al.  Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization , 2016, 1604.03257.

[4]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[5]  Pietro Perona,et al.  Estimating Causal Direction and Confounding of Two Discrete Variables , 2016, ArXiv.

[6]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[7]  Chandler Squires,et al.  Permutation-Based Causal Structure Learning with Unknown Intervention Targets , 2020, UAI.

[8]  Nicolai Meinshausen,et al.  Causal Dantzig: Fast inference in linear structural equation models with hidden variables under additive interventions , 2017, The Annals of Statistics.

[9]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[10]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[11]  Elias Bareinboim,et al.  External Validity: From Do-Calculus to Transportability Across Populations , 2014, Probabilistic and Causal Inference.

[12]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[13]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[14]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[15]  Nan Rosemary Ke,et al.  Learning Neural Causal Models from Unknown Interventions , 2019, ArXiv.

[16]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[17]  Bernhard Scholkopf Causality for Machine Learning , 2019 .

[18]  J. Dunning The elephant in the room. , 2013, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[19]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[20]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[21]  Jin Tian,et al.  Causal Discovery from Changes , 2001, UAI.

[22]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[23]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[24]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[25]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[26]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[27]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[28]  Christina Heinze-Deml,et al.  Causal Structure Learning , 2017, 1706.09141.

[29]  Gregory F. Cooper,et al.  A Bayesian Method for Constructing Bayesian Belief Networks from Databases , 1991, UAI.

[30]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[31]  Gersende Fort,et al.  On Perturbed Proximal Gradient Algorithms , 2014, J. Mach. Learn. Res..

[32]  HausslerDavid,et al.  A general lower bound on the number of examples needed for learning , 1989 .

[33]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[34]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[35]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[36]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[37]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[38]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[39]  Tristan Deleu,et al.  Gradient-Based Neural DAG Learning , 2019, ICLR.