Causality for Machine Learning

Graphical causal inference as pioneered by Judea Pearl arose from research on artificial intelligence (AI), and for a long time had little connection to the field of machine learning. This article discusses where links have been and should be established, introducing key concepts along the way. It argues that the hard open problems of machine learning and AI are intrinsically related to causality, and explains how the field is beginning to understand them.

[1]  Bernhard Schölkopf,et al.  Causal Inference by Choosing Graphs with Most Plausible Markov Kernels , 2006, AI&M.

[2]  Bernhard Schölkopf,et al.  Consistency of Causal Inference under the Additive Noise Model , 2013, ICML.

[3]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[4]  H. Reichenbach,et al.  The Direction of Time , 1959 .

[5]  Dominik Janzing,et al.  Causal Regularization , 2019, NeurIPS.

[6]  Elias Bareinboim,et al.  Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes , 2019, NeurIPS.

[7]  B. Schoelkopf,et al.  Algorithmic independence of initial condition and dynamical law in thermodynamics and causal inference , 2015, 1512.02057.

[8]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[9]  I. Guyon,et al.  Causal Generative Neural Networks , 2017, 1711.08936.

[10]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[11]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[12]  Srivatsan Srinivasan,et al.  Evaluating Reinforcement Learning Algorithms in Observational Health Settings , 2018, ArXiv.

[13]  Wolfgang Spohn,et al.  Grundlagen der Entscheidungstheorie , 1978 .

[14]  S. Maclane,et al.  Categories for the Working Mathematician , 1971 .

[15]  Christina Heinze-Deml,et al.  Conditional variance penalties and domain shift robustness , 2017, Machine Learning.

[16]  Bernhard Schölkopf,et al.  Telling cause from effect based on high-dimensional observations , 2009, ICML.

[17]  T. Haavelmo,et al.  The Probability Approach in Econometrics , 1944 .

[18]  Bernhard Schölkopf,et al.  Modeling confounding by half-sibling regression , 2016, Proceedings of the National Academy of Sciences.

[19]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[20]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[21]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[22]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[23]  W. Wootters,et al.  A single quantum cannot be cloned , 1982, Nature.

[24]  Elias Bareinboim,et al.  Fairness in Decision-Making - The Causal Explanation Formula , 2018, AAAI.

[25]  Ragnar Frisch,et al.  Autonomy Of Economic Relations , 1948 .

[26]  Felix Eggers,et al.  Gdp-B: Accounting for the Value of New and Free Goods in the Digital Economy , 2019, SSRN Electronic Journal.

[27]  Martin A. Riedmiller,et al.  Batch Reinforcement Learning , 2012, Reinforcement Learning.

[28]  Bernhard Schölkopf,et al.  Causal Markov Condition for Submodular Information Measures , 2010, COLT.

[29]  Bernhard Schölkopf,et al.  From Ordinary Differential Equations to Structural Causal Models: the deterministic case , 2013, UAI.

[30]  Bernhard Schölkopf,et al.  Towards a Learning Theory of Causation , 2015, 1502.02398.

[31]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[32]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[33]  John Asher Johnson,et al.  STELLAR AND PLANETARY PROPERTIES OF K2 CAMPAIGN 1 CANDIDATES AND VALIDATION OF 17 PLANETS, INCLUDING A PLANET RECEIVING EARTH-LIKE INSOLATION , 2015, 1503.07866.

[34]  Gunnar Rätsch,et al.  Competitive Training of Mixtures of Independent Deep Generative Models , 2018 .

[35]  Nicolas Heess,et al.  Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search , 2018, ICLR.

[36]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[37]  Dacheng Tao,et al.  Domain Generalization via Conditional Invariant Representation , 2018, ArXiv.

[38]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[39]  Bernhard Schölkopf,et al.  Inferring deterministic causal relations , 2010, UAI.

[40]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[41]  Bernhard Schölkopf,et al.  Multi-Source Domain Adaptation: A Causal View , 2015, AAAI.

[42]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[43]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[44]  Peter R. McCullough,et al.  Water Vapor on the Habitable-Zone Exoplanet K2-18b , 2019 .

[45]  Xin Dai,et al.  Toward a Reputation State: The Social Credit System Project of China , 2018 .

[46]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[47]  Daniel Foreman-Mackey,et al.  A SYSTEMATIC SEARCH FOR TRANSITING PLANETS IN THE K2 DATA , 2015, 1502.04715.

[48]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[49]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[50]  Bernhard Schölkopf,et al.  Detecting non-causal artifacts in multivariate linear regression models , 2018, ICML.

[51]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[52]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Bernhard Schölkopf,et al.  Group invariance principles for causal generative models , 2017, AISTATS.

[54]  Kailash Budhathoki,et al.  Origo: causal inference by compression , 2016, Knowledge and Information Systems.

[55]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[56]  Felix . Klein,et al.  Vergleichende Betrachtungen über neuere geometrische Forschungen , 1893 .

[57]  Bernhard Schölkopf,et al.  Causal Consistency of Structural Equation Models , 2017, UAI.

[58]  Ruocheng Guo,et al.  A Survey of Learning Causality with Data , 2018, ACM Comput. Surv..

[59]  Elias Bareinboim,et al.  Transportability from Multiple Environments with Limited Experiments: Completeness Results , 2014, NIPS.

[60]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[61]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[62]  Constantin F. Aliferis,et al.  Causal Feature Selection , 2007 .

[63]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[64]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[65]  Eric P. Xing,et al.  Learning Robust Representations by Projecting Superficial Statistics Out , 2018, ICLR.

[66]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[67]  Vaclav Smil Energy and Civilization: A History , 2017 .

[68]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[69]  Elias Bareinboim,et al.  Bandits with Unobserved Confounders: A Causal Approach , 2015, NIPS.

[70]  D. Maestripieri 1. The Secret of Our Success , 2019 .

[71]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[72]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[73]  Bernhard Schölkopf,et al.  Causal Discovery from Temporally Aggregated Time Series , 2017, UAI.

[74]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[75]  Zeb Kurth-Nelson,et al.  Causal Reasoning from Meta-reinforcement Learning , 2019, ArXiv.

[76]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[77]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[78]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[79]  Bernhard Schölkopf,et al.  On Causal Discovery with Cyclic Additive Noise Models , 2011, NIPS.

[80]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[81]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[82]  Elias Bareinboim,et al.  External Validity: From Do-Calculus to Transportability Across Populations , 2014, Probabilistic and Causal Inference.

[83]  Pietro Perona,et al.  Multi-Level Cause-Effect Systems , 2015, AISTATS.

[84]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[85]  Yongxi Chen,et al.  The Transparent Self Under Big Data Profiling: Privacy and Chinese Legislation on the Social Credit System , 2017 .

[86]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[87]  Blai Bonet,et al.  Learning First-Order Symbolic Representations for Planning from the Structure of the State Space , 2020, ECAI.

[88]  Bernhard Schölkopf,et al.  Robust Learning via Cause-Effect Models , 2011, ArXiv.

[89]  Bernhard Schölkopf,et al.  Semi-supervised interpolation in an anticausal learning scenario , 2015, J. Mach. Learn. Res..

[90]  Jonathan Tennyson,et al.  Water vapour in the atmosphere of the habitable-zone eight-Earth-mass planet K2-18 b , 2019, Nature Astronomy.

[91]  L. Broglie,et al.  Causality and chance in modern physics , 1984 .

[92]  Bernhard Schölkopf,et al.  Optimal Decision Making Under Strategic Behavior , 2019, ArXiv.

[93]  Suchi Saria,et al.  Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport , 2018, AISTATS.

[94]  Bernhard Schölkopf,et al.  Artificial intelligence: Learning to see and act , 2015, Nature.

[95]  Bernhard Schölkopf,et al.  Identifying confounders using additive noise models , 2009, UAI.

[96]  Pietro Perona,et al.  Fast Conditional Independence Test for Vector Variables with Large Sample Sizes , 2018, ArXiv.

[97]  Stefan Bauer,et al.  The Arrow of Time in Multivariate Time Series , 2016, ICML.

[98]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[99]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[100]  Stephan Günnemann,et al.  Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift , 2018, NeurIPS.

[101]  K. Hoover,et al.  Causality in Economics and Econometrics , 2006 .

[102]  Bernhard Schölkopf,et al.  Domain Adaptation with Conditional Transferable Components , 2016, ICML.

[103]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[104]  Stefan Bauer,et al.  Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness , 2018, ICML.

[105]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[106]  Dacheng Tao,et al.  Domain Generalization via Conditional Invariant Representations , 2018, AAAI.

[107]  Bernhard Schölkopf,et al.  Identifiability of Causal Graphs using Functional Models , 2011, UAI.

[108]  Bernhard Schölkopf,et al.  Deconfounding Reinforcement Learning in Observational Settings , 2018, ArXiv.

[109]  Bernhard Schölkopf,et al.  Discovering Causal Signals in Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[110]  E. Medina From Cybernetic Revolutionaries: Technology and Politics in Allende's Chile , 2011 .

[111]  Tonio Ball,et al.  Causal and anti-causal learning in pattern recognition for neuroimaging , 2015, 2014 International Workshop on Pattern Recognition in Neuroimaging.

[112]  Bernhard Schölkopf,et al.  Counterfactuals uncover the modular structure of deep generative models , 2018, ICLR.

[113]  Dan Geiger,et al.  Logical and algorithmic properties of independence and their application to Bayesian networks , 1990, Annals of Mathematics and Artificial Intelligence.

[114]  Joelle Pineau,et al.  Independently Controllable Features , 2017 .

[115]  Bernhard Schölkopf,et al.  Behind Distribution Shift: Mining Driving Forces of Changes and Causal Arrows , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[116]  Bernhard Schölkopf,et al.  Telling cause from effect in deterministic linear dynamical systems , 2015, ICML.

[117]  Isabelle Guyon,et al.  Causality : Objectives and Assessment , 2010 .

[118]  Bernhard Schölkopf,et al.  Semi-supervised learning, causality, and the conditional cluster assumption , 2019, UAI.

[119]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[120]  Rajen Dinesh Shah,et al.  The hardness of conditional independence testing and the generalised covariance measure , 2018, The Annals of Statistics.

[121]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[122]  Takashi Washio,et al.  Error asymmetry in causal and anticausal regression , 2016, Behaviormetrika.

[123]  Bernhard Schölkopf,et al.  Generalization in anti-causal learning , 2018, ArXiv.

[124]  B. Schölkopf,et al.  Kernel‐based tests for joint independence , 2016, 1603.00285.

[125]  Bernhard Schölkopf,et al.  From Deterministic ODEs to Dynamic Structural Causal Models , 2016, UAI.

[126]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.