An Analysis of the Adaptation Speed of Causal Models

We consider the problem of discovering the causal process that generated a collection of datasets. We assume that all these datasets were generated by unknown sparse interventions on a structural causal model (SCM) $G$, that we want to identify. Recently, Bengio et al. (2020) argued that among all SCMs, $G$ is the fastest to adapt from one dataset to another, and proposed a meta-learning criterion to identify the causal direction in a two-variable SCM. While the experiments were promising, the theoretical justification was incomplete. Our contribution is a theoretical investigation of the adaptation speed of simple two-variable SCMs. We use convergence rates from stochastic optimization to justify that a relevant proxy for adaptation speed is distance in parameter space after intervention. Using this proxy, we show that the SCM with the correct causal direction is advantaged for categorical and normal cause-effect datasets when the intervention is on the cause variable. When the intervention is on the effect variable, we provide a more nuanced picture which highlights that the fastest-to-adapt heuristic is not always valid. Code to reproduce experiments is available at this https URL

[1]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[2]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[3]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[4]  Christina Heinze-Deml,et al.  Causal Structure Learning , 2017, 1706.09141.

[5]  Bernhard Scholkopf Causality for Machine Learning , 2019 .

[6]  Gersende Fort,et al.  On Perturbed Proximal Gradient Algorithms , 2014, J. Mach. Learn. Res..

[7]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[8]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[9]  Pietro Perona,et al.  Estimating Causal Direction and Confounding of Two Discrete Variables , 2016, ArXiv.

[10]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[11]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[12]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[13]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[14]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[15]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[16]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[17]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[18]  HausslerDavid,et al.  A general lower bound on the number of examples needed for learning , 1989 .

[19]  Elias Bareinboim,et al.  External Validity: From Do-Calculus to Transportability Across Populations , 2014, Probabilistic and Causal Inference.

[20]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[21]  Yoshua Bengio,et al.  Learning Neural Causal Models from Unknown Interventions , 2019, ArXiv.

[22]  Nicolai Meinshausen,et al.  Causal Dantzig: Fast inference in linear structural equation models with hidden variables under additive interventions , 2017, The Annals of Statistics.

[23]  Jin Tian,et al.  Causal Discovery from Changes , 2001, UAI.

[24]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.