Causality for Machine Learning

Graphical causal inference as pioneered by Judea Pearl arose from research on artificial intelligence (AI), and for a long time had little connection to the field of machine learning. This article discusses where links have been and should be established, introducing key concepts along the way. It argues that the hard open problems of machine learning and AI are intrinsically related to causality, and explains how the field is beginning to understand them.

[1]  Bernhard Schölkopf,et al.  Generalization in anti-causal learning , 2018, ArXiv.

[2]  Dominic R. Verity,et al.  ∞-Categories for the Working Mathematician , 2018 .

[3]  Bernhard Schölkopf,et al.  Optimal Decision Making Under Strategic Behavior , 2019, ArXiv.

[4]  John Asher Johnson,et al.  STELLAR AND PLANETARY PROPERTIES OF K2 CAMPAIGN 1 CANDIDATES AND VALIDATION OF 17 PLANETS, INCLUDING A PLANET RECEIVING EARTH-LIKE INSOLATION , 2015, 1503.07866.

[5]  B. Schölkopf,et al.  Kernel‐based tests for joint independence , 2016, 1603.00285.

[6]  Gunnar Rätsch,et al.  Competitive Training of Mixtures of Independent Deep Generative Models , 2018 .

[7]  Suchi Saria,et al.  Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport , 2018, AISTATS.

[8]  Bernhard Schölkopf,et al.  From Deterministic ODEs to Dynamic Structural Causal Models , 2016, UAI.

[9]  Nicolas Heess,et al.  Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search , 2018, ICLR.

[10]  Dacheng Tao,et al.  Domain Generalization via Conditional Invariant Representation , 2018, ArXiv.

[11]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[12]  Bernhard Schölkopf,et al.  Artificial intelligence: Learning to see and act , 2015, Nature.

[13]  W. Wootters,et al.  A single quantum cannot be cloned , 1982, Nature.

[14]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[15]  T. Haavelmo,et al.  The Probability Approach in Econometrics , 1944 .

[16]  Bernhard Schölkopf,et al.  Consistency of Causal Inference under the Additive Noise Model , 2013, ICML.

[17]  Bernhard Schölkopf,et al.  From Ordinary Differential Equations to Structural Causal Models: the deterministic case , 2013, UAI.

[18]  Bernhard Schölkopf,et al.  Towards a Learning Theory of Causation , 2015, 1502.02398.

[19]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[20]  Srivatsan Srinivasan,et al.  Evaluating Reinforcement Learning Algorithms in Observational Health Settings , 2018, ArXiv.

[21]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[22]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[23]  Dominik Janzing,et al.  Causal Regularization , 2019, NeurIPS.

[24]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[25]  C. Roberts,et al.  Foundation , 2000, The Fairchild Books Dictionary of Fashion.

[26]  Christina Heinze-Deml,et al.  Conditional variance penalties and domain shift robustness , 2017, Machine Learning.

[27]  Bernhard Schölkopf,et al.  Telling cause from effect based on high-dimensional observations , 2009, ICML.

[28]  Bernhard Schölkopf,et al.  Identifying confounders using additive noise models , 2009, UAI.

[29]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[30]  Felix . Klein,et al.  Vergleichende Betrachtungen über neuere geometrische Forschungen , 1893 .

[31]  M. Rowland The Secret of Our Success. , 2008 .

[32]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[33]  B. Schoelkopf,et al.  Algorithmic independence of initial condition and dynamical law in thermodynamics and causal inference , 2015, 1512.02057.

[34]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[35]  Bernhard Schölkopf,et al.  Modeling confounding by half-sibling regression , 2016, Proceedings of the National Academy of Sciences.

[36]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[37]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[38]  Bernhard Schölkopf,et al.  Causal Inference by Choosing Graphs with Most Plausible Markov Kernels , 2006, AI&M.

[39]  Bernhard Schölkopf,et al.  Identifiability of Causal Graphs using Functional Models , 2011, UAI.

[40]  Bernhard Schölkopf,et al.  Deconfounding Reinforcement Learning in Observational Settings , 2018, ArXiv.

[41]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[42]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[43]  Elias Bareinboim,et al.  External Validity: From Do-Calculus to Transportability Across Populations , 2014, Probabilistic and Causal Inference.

[44]  Bernhard Schölkopf,et al.  Causal Consistency of Structural Equation Models , 2017, UAI.

[45]  Pietro Perona,et al.  Multi-Level Cause-Effect Systems , 2015, AISTATS.

[46]  Bernhard Schölkopf,et al.  Discovering Causal Signals in Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[48]  Pietro Perona,et al.  Fast Conditional Independence Test for Vector Variables with Large Sample Sizes , 2018, ArXiv.

[49]  Bernhard Schölkopf,et al.  Robust Learning via Cause-Effect Models , 2011, ArXiv.

[50]  Bernhard Schölkopf,et al.  Semi-supervised interpolation in an anticausal learning scenario , 2015, J. Mach. Learn. Res..

[51]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[52]  Constantin F. Aliferis,et al.  Causal Feature Selection , 2007 .

[53]  Elias Bareinboim,et al.  Fairness in Decision-Making - The Causal Explanation Formula , 2018, AAAI.

[54]  Aaron D. Wyner,et al.  Coding Theorems for a Discrete Source With a Fidelity CriterionInstitute of Radio Engineers, International Convention Record, vol. 7, 1959. , 1993 .

[55]  Xin Dai,et al.  Toward a Reputation State: The Social Credit System Project of China , 2018 .

[56]  Daniel Foreman-Mackey,et al.  A SYSTEMATIC SEARCH FOR TRANSITING PLANETS IN THE K2 DATA , 2015, 1502.04715.

[57]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[58]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[59]  Rajen Dinesh Shah,et al.  The hardness of conditional independence testing and the generalised covariance measure , 2018, The Annals of Statistics.

[60]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[61]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[62]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[63]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[64]  Ragnar Frisch,et al.  Autonomy Of Economic Relations , 1948 .

[65]  Felix Eggers,et al.  Gdp-B: Accounting for the Value of New and Free Goods in the Digital Economy , 2019, SSRN Electronic Journal.

[66]  Takashi Washio,et al.  Error asymmetry in causal and anticausal regression , 2016, Behaviormetrika.

[67]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[68]  Ruocheng Guo,et al.  A Survey of Learning Causality with Data , 2018, ACM Comput. Surv..

[69]  Vaclav Smil Energy and Civilization: A History , 2017 .

[70]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[71]  Eve E. Buckley Cybernetic Revolutionaries: Technology and Politics in Allende’s Chile , 2013 .

[72]  Elias Bareinboim,et al.  Bandits with Unobserved Confounders: A Causal Approach , 2015, NIPS.

[73]  Stephan Günnemann,et al.  Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift , 2018, NeurIPS.

[74]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[75]  K. Hoover,et al.  Causality in Economics and Econometrics , 2006 .

[76]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[77]  Bernhard Schölkopf,et al.  Domain Adaptation with Conditional Transferable Components , 2016, ICML.

[78]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[79]  Stefan Bauer,et al.  Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness , 2018, ICML.

[80]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[81]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[82]  Bernhard Schölkopf,et al.  Inferring deterministic causal relations , 2010, UAI.

[83]  T. Kibble Causality and Chance in Modern Physics , 1984 .

[84]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[85]  Bernhard Schölkopf,et al.  Multi-Source Domain Adaptation: A Causal View , 2015, AAAI.

[86]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[87]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[88]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[89]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[90]  Bernhard Schölkopf,et al.  Detecting non-causal artifacts in multivariate linear regression models , 2018, ICML.

[91]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  Bernhard Schölkopf,et al.  Group invariance principles for causal generative models , 2017, AISTATS.

[93]  Kailash Budhathoki,et al.  Origo: causal inference by compression , 2016, Knowledge and Information Systems.

[94]  Elias Bareinboim,et al.  Transportability from Multiple Environments with Limited Experiments: Completeness Results , 2014, NIPS.

[95]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[96]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[97]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[98]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[99]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[100]  Jonathan Tennyson,et al.  Water vapour in the atmosphere of the habitable-zone eight-Earth-mass planet K2-18 b , 2019, Nature Astronomy.

[101]  L. Elton,et al.  THE DIRECTION OF TIME , 1978 .

[102]  Bernhard Schölkopf,et al.  Behind Distribution Shift: Mining Driving Forces of Changes and Causal Arrows , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[103]  Bernhard Schölkopf,et al.  Telling cause from effect in deterministic linear dynamical systems , 2015, ICML.

[104]  Daniel Thalmann,et al.  Autonomy , 2005, SIGGRAPH Courses.

[105]  Isabelle Guyon,et al.  Causality : Objectives and Assessment , 2010 .

[106]  Tonio Ball,et al.  Causal and anti-causal learning in pattern recognition for neuroimaging , 2015, 2014 International Workshop on Pattern Recognition in Neuroimaging.

[107]  P. Spirtes,et al.  Causation, Prediction, and Search, 2nd Edition , 2001 .

[108]  Bernhard Schölkopf,et al.  Counterfactuals uncover the modular structure of deep generative models , 2018, ICLR.

[109]  Dan Geiger,et al.  Logical and algorithmic properties of independence and their application to Bayesian networks , 1990, Annals of Mathematics and Artificial Intelligence.

[110]  Joelle Pineau,et al.  Independently Controllable Features , 2017 .

[111]  Elias Bareinboim,et al.  Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes , 2019, NeurIPS.

[112]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[113]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[114]  Yongxi Chen,et al.  The Transparent Self Under Big Data Profiling: Privacy and Chinese Legislation on the Social Credit System , 2017 .

[115]  Bernhard Schölkopf,et al.  Regression by dependence minimization and its application to causal inference in additive noise models , 2009, ICML '09.

[116]  Blai Bonet,et al.  Learning First-Order Symbolic Representations for Planning from the Structure of the State Space , 2020, ECAI.

[117]  P. Cheng,et al.  Causal Learning , 2012 .

[118]  Martin A. Riedmiller,et al.  Batch Reinforcement Learning , 2012, Reinforcement Learning.

[119]  Bernhard Schölkopf,et al.  Causal Markov Condition for Submodular Information Measures , 2010, COLT.

[120]  I. Guyon,et al.  Causal Generative Neural Networks , 2017, 1711.08936.

[121]  Wolfgang Spohn,et al.  Grundlagen der Entscheidungstheorie , 1978 .

[122]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[123]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[124]  Eric P. Xing,et al.  Learning Robust Representations by Projecting Superficial Statistics Out , 2018, ICLR.

[125]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[126]  Bernhard Schölkopf,et al.  Causal Discovery from Temporally Aggregated Time Series , 2017, UAI.

[127]  Zeb Kurth-Nelson,et al.  Causal Reasoning from Meta-reinforcement Learning , 2019, ArXiv.

[128]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[129]  Bernhard Schölkopf,et al.  On Causal Discovery with Cyclic Additive Noise Models , 2011, NIPS.

[130]  Stefan Bauer,et al.  The Arrow of Time in Multivariate Time Series , 2016, ICML.

[131]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.