Nonparametric Identifiability of Causal Representations from Unknown Interventions

We study causal representation learning, the task of inferring latent causal variables and their causal relations from high-dimensional mixtures of the variables. Prior work relies on weak supervision, in the form of counterfactual pre- and post-intervention views or temporal structure; places restrictive assumptions, such as linearity, on the mixing function or latent causal model; or requires partial knowledge of the generative process, such as the causal graph or intervention targets. We instead consider the general setting in which both the causal model and the mixing function are nonparametric. The learning signal takes the form of multiple datasets, or environments, arising from unknown interventions in the underlying causal model. Our goal is to identify both the ground truth latents and their causal graph up to a set of ambiguities which we show to be irresolvable from interventional data. We study the fundamental setting of two causal variables and prove that the observational distribution and one perfect intervention per node suffice for identifiability, subject to a genericity condition. This condition rules out spurious solutions that involve fine-tuning of the intervened and observational distributions, mirroring similar conditions for nonlinear cause-effect inference. For an arbitrary number of variables, we show that at least one pair of distinct perfect interventional domains per node guarantees identifiability. Further, we demonstrate that the strengths of causal influences among the latent variables are preserved by all equivalent solutions, rendering the inferred representation appropriate for drawing causal conclusions from new data. Our study provides the first identifiability results for the general nonparametric setting with unknown interventions, and elucidates what is possible and impossible for causal representation learning without more direct supervision.

[1]  Julius von Kügelgen,et al.  Spuriosity Didn't Kill the Classifier: Using Invariant Predictions to Harness Spurious Features , 2023, NeurIPS.

[2]  B. Schölkopf,et al.  Learning Linear Causal Representations from Interventions under General Nonlinear Mixing , 2023, NeurIPS.

[3]  Julius von Kügelgen,et al.  Causal Component Analysis , 2023, ArXiv.

[4]  Roland S. Zimmermann,et al.  Provably Learning Object-Centric Representations , 2023, ICML.

[5]  H. Morioka,et al.  Nonlinear independent component analysis for principled disentanglement in unsupervised deep learning , 2023, Patterns.

[6]  Abhishek Kumar,et al.  Score-based Causal Representation Learning with Interventions , 2023, ArXiv.

[7]  Fabio Massimo Zennaro,et al.  Jointly Learning Consistent Causal Abstractions Over Multiple Interventional Distributions , 2023, CLeaR.

[8]  A. Seigal,et al.  Linear Causal Disentanglement via Interventions , 2022, ICML.

[9]  S. Lacoste-Julien,et al.  Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning , 2022, ICML.

[10]  Weiran Yao,et al.  Temporally Disentangled Representation Learning , 2022, NeurIPS.

[11]  Pascal Vincent,et al.  Disentanglement of Correlated Factors via Hausdorff Factorized Support , 2022, ICLR.

[12]  Julia E. Vogt,et al.  On the Identifiability and Estimation of Causal Location-Scale Noise Models , 2022, ICML.

[13]  Julius von Kügelgen,et al.  DCI-ES: An Extended Disentanglement Framework with Connections to Identifiability , 2022, ICLR.

[14]  Y. Bengio,et al.  Interventional Causal Representation Learning , 2022, ICML.

[15]  Mingming Gong,et al.  Identifying Weight-Variant Latent Causal Models , 2022, 2208.14153.

[16]  B. Schölkopf,et al.  Function Classes for Identifiable Nonlinear Independent Component Analysis , 2022, NeurIPS.

[17]  George J. Pappas,et al.  Probable Domain Generalization via Quantile Risk Minimization , 2022, NeurIPS.

[18]  Fabio Massimo Zennaro Abstraction between Structural Causal Models: A Review of Definitions and Properties , 2022, ArXiv.

[19]  S. Lacoste-Julien,et al.  Partial Disentanglement via Mechanism Sparsity , 2022, ArXiv.

[20]  Pradeep Ravikumar,et al.  Identifiability of deep generative models without auxiliary information , 2022, NeurIPS.

[21]  Yuki M. Asano,et al.  Causal Representation Learning for Instantaneous and Temporal Effects in Interactive Systems , 2022, ICLR.

[22]  Julius von Kügelgen,et al.  Causal Discovery in Heterogeneous Environments Under the Sparse Mechanism Shift Hypothesis , 2022, NeurIPS.

[23]  Benjamin Bloem-Reddy,et al.  Indeterminacy in Generative Models: Characterization and Strong Identifiability , 2022, AISTATS.

[24]  Jason S. Hartford,et al.  Weakly Supervised Representation Learning with Sparse Perturbations , 2022, NeurIPS.

[25]  Caroline Uhler,et al.  Causal Structure Learning: A Combinatorial Perspective , 2022, Foundations of Computational Mathematics.

[26]  T. Lasko,et al.  Identifying Patient-Specific Root Causes with the Heteroscedastic Noise Model , 2022, J. Comput. Sci..

[27]  E. Bareinboim,et al.  Causal Transportability for Visual Recognition , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Vasilis Syrgkanis,et al.  Towards efficient representation identification in supervised learning , 2022, CLeaR.

[29]  Julius von Kügelgen,et al.  From Statistical to Causal Learning , 2022, ArXiv.

[30]  Taco Cohen,et al.  Weakly supervised causal representation learning , 2022, NeurIPS.

[31]  E. Bareinboim,et al.  On Pearl’s Hierarchy and the Foundations of Causal Inference , 2022, Probabilistic and Causal Inference.

[32]  Yuki M. Asano,et al.  CITRIS: Causal Identifiability from Temporal Intervened Sequences , 2022, ICML.

[33]  Yoshua Bengio,et al.  Properties from Mechanisms: An Equivariance Perspective on Identifiable Representation Learning , 2021, ICLR.

[34]  Gemma E. Moran,et al.  Identifiable Deep Generative Models via Sparse Decoding , 2021, Trans. Mach. Learn. Res..

[35]  Changyin Sun,et al.  Learning Temporally Causal Latent Processes from General Temporal Data , 2021, ICLR.

[36]  Michael I. Jordan,et al.  Desiderata for Representation Learning: A Causal Perspective , 2021, ArXiv.

[37]  Rémi Le Priol,et al.  Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA , 2021, CLeaR.

[38]  Pradeep Ravikumar,et al.  Learning latent causal graphs via mixture oracles , 2021, NeurIPS.

[39]  Aapo Hyvärinen,et al.  Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA , 2021, NeurIPS.

[40]  Julius von Kügelgen,et al.  Independent mechanism analysis, a new concept? , 2021, NeurIPS.

[41]  Luigi Gresele,et al.  Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style , 2021, NeurIPS.

[42]  Nan Rosemary Ke,et al.  Toward Causal Representation Learning , 2021, Proceedings of the IEEE.

[43]  Uri Shalit,et al.  On Calibration and Out-of-domain Generalization , 2021, NeurIPS.

[44]  Roland S. Zimmermann,et al.  Contrastive Learning Inverts the Data Generating Process , 2021, ICML.

[45]  B. Schölkopf,et al.  Conditional Distributional Treatment Effect with Kernel Conditional Mean Embeddings and U-Statistic Regression , 2021, ICML.

[46]  C. Glymour,et al.  Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs , 2020, NeurIPS.

[47]  Zhitang Chen,et al.  Weakly Supervised Disentangled Generative Causal Representation Learning , 2020, J. Mach. Learn. Res..

[48]  Matthias Bethge,et al.  Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding , 2020, ICLR.

[49]  Alexandre Lacoste,et al.  Differentiable Causal Discovery from Interventional Data , 2020, NeurIPS.

[50]  Luke Metz,et al.  On Linear Identifiability of Learned Representations , 2020, ICML.

[51]  Luigi Gresele,et al.  Relative gradient optimization of the Jacobian term in unsupervised deep learning , 2020, NeurIPS.

[52]  Aapo Hyvarinen,et al.  Hidden Markov Nonlinear ICA: Unsupervised Learning from Nonstationary Time Series , 2020, UAI.

[53]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[54]  Zhitang Chen,et al.  CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[56]  Diederik P. Kingma,et al.  ICE-BeeM: Identifiable Conditional Energy-Based Deep Models , 2020, NeurIPS.

[57]  Ben Poole,et al.  Weakly-Supervised Disentanglement Without Compromises , 2020, ICML.

[58]  C. Rother,et al.  Disentanglement by Nonlinear ICA with General Incompressible-flow Networks (GIN) , 2020, ICLR.

[59]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[60]  B. Schölkopf,et al.  Causality for Machine Learning , 2019, Probabilistic and Causal Inference.

[61]  Ben Poole,et al.  Weakly Supervised Disentanglement with Guarantees , 2019, ICLR.

[62]  E. Xing,et al.  Learning Sparse Nonparametric DAGs , 2019, AISTATS.

[63]  L. Goldberg,et al.  The Book of Why: The New Science of Cause and Effect† , 2019, Quantitative Finance.

[64]  Aapo Hyvärinen,et al.  Variational Autoencoders and Nonlinear ICA: A Unifying Framework , 2019, AISTATS.

[65]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[66]  Joseph Y. Halpern,et al.  Approximate Causal Abstractions , 2019, UAI.

[67]  Iain Murray,et al.  Neural Spline Flows , 2019, NeurIPS.

[68]  Bernhard Schölkopf,et al.  The Incomplete Rosetta Stone problem: Identifiability results for Multi-view Nonlinear ICA , 2019, UAI.

[69]  Bernhard Schölkopf,et al.  Causal Discovery from Heterogeneous/Nonstationary Data , 2019, J. Mach. Learn. Res..

[70]  Joseph Y. Halpern,et al.  Abstracting Causal Models , 2018, AAAI.

[71]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[72]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[73]  Aapo Hyvärinen,et al.  Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning , 2018, AISTATS.

[74]  Mélanie Frappier,et al.  The Book of Why: The New Science of Cause and Effect , 2018, Science.

[75]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[76]  N. Meinshausen,et al.  Anchor regression: Heterogeneous data meet causality , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[77]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[78]  N. Meinshausen,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[79]  Bernhard Schölkopf,et al.  Group invariance principles for causal generative models , 2017, AISTATS.

[80]  Aapo Hyvärinen,et al.  Nonlinear ICA of Temporally Dependent Stationary Sources , 2017, AISTATS.

[81]  Joris M. Mooij,et al.  Joint Causal Inference from Multiple Contexts , 2016, J. Mach. Learn. Res..

[82]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[83]  J. Pearl,et al.  Causal inference and the data-fusion problem , 2016, Proceedings of the National Academy of Sciences.

[84]  Bernhard Schölkopf,et al.  Discovering Causal Signals in Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Aapo Hyvärinen,et al.  Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA , 2016, NIPS.

[86]  F. Eberhardt Green and grue causal variables , 2016, Synthese.

[87]  Pietro Perona,et al.  Multi-Level Cause-Effect Systems , 2015, AISTATS.

[88]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[89]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[90]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[91]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[92]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[93]  Pietro Perona,et al.  Visual Causal Feature Learning , 2014, UAI.

[94]  Elias Bareinboim,et al.  External Validity: From Do-Calculus to Transportability Across Populations , 2014, Probabilistic and Causal Inference.

[95]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[96]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[97]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[98]  Peter Buhlmann,et al.  Geometry of the faithfulness assumption in causal inference , 2012, 1207.0547.

[99]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[100]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[102]  Moritz Grosse-Wentrup,et al.  Quantifying causal influences , 2012, 1203.6502.

[103]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[104]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[105]  J. Pearl,et al.  Causal Inference , 2011, Twenty-one Mental Models That Can Change Policing.

[106]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[107]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[108]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[109]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[110]  R. Scheines,et al.  Interventions and Causal Inference , 2007, Philosophy of Science.

[111]  Christopher Winship,et al.  Counterfactuals and Causal Inference: Methods and Principles for Social Research , 2007 .

[112]  Kevin P. Murphy,et al.  Exact Bayesian structure learning from uncertain interventions , 2007, AISTATS.

[113]  Richard Scheines,et al.  Learning the Structure of Linear Latent Variable Models , 2006, J. Mach. Learn. Res..

[114]  Yichao Ou,et al.  Cover , 2006, Brain and Development.

[115]  Visa Koivunen,et al.  Identifiability, separability, and uniqueness of linear ICA models , 2004, IEEE Signal Processing Letters.

[116]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[117]  A. Dawid Influence Diagrams for Causal Modelling and Inference , 2002 .

[118]  Jin Tian,et al.  Causal Discovery from Changes , 2001, UAI.

[119]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[120]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[121]  Aapo Hyvärinen,et al.  Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.

[122]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[123]  Yuhuai Wu,et al.  Invariant Causal Representation Learning for Out-of-Distribution Generalization , 2022, ICLR.

[124]  S. Lacoste-Julien,et al.  Synergies Between Disentanglement and Sparsity: a Multi-Task Learning Perspective , 2022, ArXiv.

[125]  Yangbo He,et al.  Identification of Linear Non-Gaussian Latent Hierarchical Structure , 2022, ICML.

[126]  E. Bareinboim,et al.  Effect Identification in Cluster Causal Diagrams , 2022, ArXiv.

[127]  N. Hansen,et al.  Identification of Partially Observed Linear Causal Models: Graphical Conditions for the Non-Gaussian and Heterogeneous Cases , 2021, NeurIPS.

[128]  Murat Kocaoglu,et al.  Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learning , 2020, NeurIPS.

[129]  Julius von Kügelgen,et al.  xxAI - Beyond Explainable Artificial Intelligence , 2020, xxAI@ICML.

[130]  Ruichu Cai,et al.  Triad Constraints for Learning Causal Structure of Latent Variables , 2019, NeurIPS.

[131]  Karthikeyan Shanmugam,et al.  Characterization and Learning of Causal Graphs with Latent Variables from Soft Interventions , 2019, NeurIPS.

[132]  Karthikeyan Shanmugam,et al.  Experimental Design for Learning Causal Graphs with Latent Variables , 2017, NIPS.

[133]  P. Perona,et al.  Causal feature learning: an overview , 2017 .

[134]  Bernhard Schölkopf,et al.  Causal Consistency of Structural Equation Models , 2017, UAI.

[135]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[136]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2009 .

[137]  Frederick Eberhardt,et al.  N-1 Experiments Suffice to Determine the Causal Relations Among N Variables , 2006 .

[138]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[139]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[140]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .