Invariant Ancestry Search

Recently, methods have been proposed that exploit the invariance of prediction models with respect to changing environments to infer subsets of the causal parents of a response variable. If the environments influence only few of the underlying mechanisms, the subset identified by invariant causal prediction (ICP), for example, may be small, or even empty. We introduce the concept of minimal invariance and propose invariant ancestry search (IAS). In its population version, IAS outputs a set which contains only ancestors of the response and is a superset of the output of ICP. When applied to data, corresponding guarantees hold asymptotically if the underlying test for invariance has asymptotic level and power. We develop scalable algorithms and perform experiments on simulated and real data.

[1]  Michael I. Jordan Graphical Models , 2003 .

[2]  Guillaume Martinet,et al.  Variance Minimization in the Wasserstein Space for Invariant Causal Prediction , 2021, AISTATS.

[3]  Jonas Peters,et al.  Statistical testing under distributional shifts , 2021, Journal of the Royal Statistical Society Series B: Statistical Methodology.

[4]  Daniel J. Singer To the Best of Our Knowledge , 2021, The Philosophical Review.

[5]  Christof Seiler,et al.  Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy to Game , 2021, NeurIPS.

[6]  Jonas Peters,et al.  A Causal Framework for Distribution Generalization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Juan L Gamella,et al.  Active Invariant Causal Prediction: Experiment Selection through Stability , 2020, NeurIPS.

[8]  Caroline Uhler,et al.  Permutation-Based Causal Structure Learning with Unknown Intervention Targets , 2019, UAI.

[9]  Michael I. Jordan,et al.  Artificial Intelligence—The Revolution Hasn’t Happened Yet , 2019, Issue 1.

[10]  Thomas B. Berrett,et al.  The conditional permutation test for independence while controlling for confounders , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[11]  Rajen Dinesh Shah,et al.  The hardness of conditional independence testing and the generalised covariance measure , 2018, The Annals of Statistics.

[12]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[13]  Maciej Liskiewicz,et al.  Separators and Adjustment Sets in Causal Graphs: Complete Criteria and an Algorithmic Framework , 2018, Artif. Intell..

[14]  Judea Pearl,et al.  Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution , 2018, WSDM.

[15]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[16]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[17]  J. Peters,et al.  Invariant Causal Prediction for Sequential Data , 2017, Journal of the American Statistical Association.

[18]  Maciej Liskiewicz,et al.  Robust causal inference using Directed Acyclic Graphs: the R package , 2018 .

[19]  Joris M. Mooij,et al.  Joint Causal Inference from Multiple Contexts , 2016, J. Mach. Learn. Res..

[20]  J. Mooij,et al.  Foundations of structural causal models with cycles and latent variables , 2016, The Annals of Statistics.

[21]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[22]  Serge Gaspers,et al.  On the number of minimal separators in graphs , 2015, WG.

[23]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[24]  Mehdi M. Kashani,et al.  Large-Scale Genetic Perturbations Reveal Regulatory Networks and an Abundance of Gene-Specific Repressors , 2014, Cell.

[25]  B. Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[26]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[27]  Ken Takata,et al.  Space-optimal, backtracking algorithms to list the minimal vertex separators of a graph , 2010, Discret. Appl. Math..

[28]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[29]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[30]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[31]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[32]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[33]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[34]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[35]  Luis M. de Campos,et al.  An Algorithm for Finding Minimum d-Separating Sets in Belief Networks , 1996, UAI.

[36]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[37]  Peter Buhlmann,et al.  Higher-order least squares: assessing partial goodness of fit of linear regression , 2021 .

[38]  Jin Tian,et al.  Finding Minimal D-separators , 1998 .

[39]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .