Towards Causal Representation Learning

The two fields of machine learning and graphical causality arose and are developed separately. However, there is, now, cross-pollination and increasing interest in both fields to benefit from the advances of the other. In this article, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research. This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, that is, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities.

[1]  P. Pronovost,et al.  A targeted real-time early warning score (TREWScore) for septic shock , 2015, Science Translational Medicine.

[2]  P. Cheng,et al.  Causal Learning , 2012 .

[3]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[4]  Bernhard Schölkopf,et al.  Detecting non-causal artifacts in multivariate linear regression models , 2018, ICML.

[5]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Abigail J. Stewart,et al.  Missed opportunities: Psychological ramifications of counterfactual thought in midlife women , 1995 .

[7]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[8]  O. Penrose The Direction of Time , 1962 .

[9]  Jimeng Sun,et al.  Causal Regularization , 2019, NeurIPS.

[10]  Nils Thürey,et al.  Latent Space Physics: Towards Learning the Temporal Evolution of Fluid Flow , 2018, Comput. Graph. Forum.

[11]  Bernhard Schölkopf,et al.  Telling cause from effect in deterministic linear dynamical systems , 2015, ICML.

[12]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[13]  Bernhard Schölkopf,et al.  Identifiability of Causal Graphs using Functional Models , 2011, UAI.

[14]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[15]  Daan Wierstra,et al.  Recurrent Environment Simulators , 2017, ICLR.

[16]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[17]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[18]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[19]  Michael I. Jordan Graphical Models , 1998 .

[20]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[21]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[22]  Elias Bareinboim,et al.  Transportability from Multiple Environments with Limited Experiments: Completeness Results , 2014, NIPS.

[23]  Stefan Bauer,et al.  On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset , 2019, NeurIPS.

[24]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[25]  Lucas Beyer,et al.  Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.

[26]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[27]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[28]  Yoshua Bengio,et al.  An Analysis of the Adaptation Speed of Causal Models , 2020, AISTATS.

[29]  Suchi Saria,et al.  Counterfactual Normalization: Proactively Addressing Dataset Shift and Improving Reliability Using Causal Mechanisms , 2018, ArXiv.

[30]  Ankush Gupta,et al.  Unsupervised Learning of Object Keypoints for Perception and Control , 2019, NeurIPS.

[31]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[32]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[33]  Bernhard Schölkopf,et al.  Generalization in anti-causal learning , 2018, ArXiv.

[34]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[35]  Stephen C. Adams,et al.  Counterfactual Multi-Agent Reinforcement Learning with Graph Convolution Communication , 2020, ArXiv.

[36]  Bernhard Schölkopf,et al.  Deconfounding Reinforcement Learning in Observational Settings , 2018, ArXiv.

[37]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[38]  J. Peters,et al.  Invariant Causal Prediction for Sequential Data , 2017, Journal of the American Statistical Association.

[39]  Dan Geiger,et al.  Logical and algorithmic properties of independence and their application to Bayesian networks , 1990, Annals of Mathematics and Artificial Intelligence.

[40]  Stefan Bauer,et al.  The Arrow of Time in Multivariate Time Series , 2016, ICML.

[41]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[42]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[43]  Yoshua Bengio,et al.  Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems , 2020, ArXiv.

[44]  Anthony Zee,et al.  Die Rückseite des Spiegels , 1990 .

[45]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[46]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[47]  Bernhard Schölkopf,et al.  Group invariance principles for causal generative models , 2017, AISTATS.

[48]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[49]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[50]  Fabio Viola,et al.  Causally Correct Partial Models for Reinforcement Learning , 2020, ArXiv.

[51]  Bernhard Schölkopf,et al.  Causal Markov Condition for Submodular Information Measures , 2010, COLT.

[52]  R. Matthews Storks Deliver Babies (p= 0.008) , 2000 .

[53]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[54]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[55]  Elias Bareinboim,et al.  Bandits with Unobserved Confounders: A Causal Approach , 2015, NIPS.

[56]  Bernhard Schölkopf,et al.  Spatially Structured Recurrent Modules , 2021, ICLR.

[57]  Bernhard Schölkopf,et al.  From Ordinary Differential Equations to Structural Causal Models: the deterministic case , 2013, UAI.

[58]  B. Schölkopf,et al.  Kernel‐based tests for joint independence , 2016, 1603.00285.

[59]  Adrian Weller,et al.  On the Fairness of Causal Algorithmic Recourse , 2020, ArXiv.

[60]  Georg Heigold,et al.  Object-Centric Learning with Slot Attention , 2020, NeurIPS.

[61]  Jonathan Tennyson,et al.  Water vapour in the atmosphere of the habitable-zone eight-Earth-mass planet K2-18 b , 2019, Nature Astronomy.

[62]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[63]  Geoffrey E. Hinton,et al.  NeuroAnimator: fast neural network emulation and control of physics-based models , 1998, SIGGRAPH.

[64]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[65]  Jonas Peters,et al.  Causal Models for Dynamical Systems , 2020, Probabilistic and Causal Inference.

[66]  Bernhard Schölkopf,et al.  Regression by dependence minimization and its application to causal inference in additive noise models , 2009, ICML '09.

[67]  Bernhard Schölkopf,et al.  The Incomplete Rosetta Stone problem: Identifiability results for Multi-view Nonlinear ICA , 2019, UAI.

[68]  Alexander Lerchner,et al.  COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration , 2019, ArXiv.

[69]  Bernhard Schölkopf,et al.  Optimal Decision Making Under Strategic Behavior , 2019, ArXiv.

[70]  Nicolas Heess,et al.  Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search , 2018, ICLR.

[71]  Ciarán M Lee,et al.  Improving the accuracy of medical diagnosis with causal machine learning , 2020, Nature Communications.

[72]  Dacheng Tao,et al.  Domain Generalization via Conditional Invariant Representation , 2018, ArXiv.

[73]  Wolfgang Spohn,et al.  Grundlagen der Entscheidungstheorie , 1978 .

[74]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[75]  Elias Bareinboim,et al.  Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes , 2019, NeurIPS.

[76]  G. Lugosi,et al.  Strong Universal Consistency of Neural Network Classifiers , 1993, Proceedings. IEEE International Symposium on Information Theory.

[77]  Bernhard Schölkopf,et al.  Identifying confounders using additive noise models , 2009, UAI.

[78]  Michael C. Mozer,et al.  Learning Deep Disentangled Embeddings with the F-Statistic Loss , 2018, NeurIPS.

[79]  Daniel Foreman-Mackey,et al.  A SYSTEMATIC SEARCH FOR TRANSITING PLANETS IN THE K2 DATA , 2015, 1502.04715.

[80]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[81]  Suchi Saria,et al.  Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport , 2018, AISTATS.

[82]  Bernhard Scholkopf,et al.  Structural Autoencoders Improve Representations for Generation and Transfer , 2020, ArXiv.

[83]  Bernt Schiele,et al.  Not Using the Car to See the Sidewalk — Quantifying and Controlling the Effects of Context in Classification and Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Ben Poole,et al.  Weakly-Supervised Disentanglement Without Compromises , 2020, ICML.

[85]  Srivatsan Srinivasan,et al.  Evaluating Reinforcement Learning Algorithms in Observational Health Settings , 2018, ArXiv.

[86]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[87]  Luigi Gresele,et al.  Learning explanations that are hard to vary , 2020, ArXiv.

[88]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[89]  Jessica B. Hamrick,et al.  Structured agents for physical construction , 2019, ICML.

[90]  Aaron van den Oord,et al.  Shaping Belief States with Generative Environment Models for RL , 2019, NeurIPS.

[91]  Hazim Kemal Ekenel,et al.  How Image Degradations Affect Deep CNN-Based Face Recognition? , 2016, 2016 International Conference of the Biometrics Special Interest Group (BIOSIG).

[92]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[93]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[94]  S. Gelly,et al.  Self-Supervised Learning of Video-Induced Visual Invariances , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[95]  Bernhard Schölkopf,et al.  A Permutation-Based Kernel Conditional Independence Test , 2014, UAI.

[96]  Jure Leskovec,et al.  Learning to Simulate Complex Physics with Graph Networks , 2020, ICML.

[97]  Arvid Lundervold,et al.  An overview of deep learning in medical imaging focusing on MRI , 2018, Zeitschrift fur medizinische Physik.

[98]  Stefan Bauer,et al.  Learning stable and predictive structures in kinetic systems , 2018, Proceedings of the National Academy of Sciences.

[99]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[100]  John Asher Johnson,et al.  STELLAR AND PLANETARY PROPERTIES OF K2 CAMPAIGN 1 CANDIDATES AND VALIDATION OF 17 PLANETS, INCLUDING A PLANET RECEIVING EARTH-LIKE INSOLATION , 2015, 1503.07866.

[101]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[102]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[103]  Elias Bareinboim,et al.  External Validity: From Do-Calculus to Transportability Across Populations , 2014, Probabilistic and Causal Inference.

[104]  Benjamin Recht,et al.  Do Image Classifiers Generalize Across Time , 2019 .

[105]  Franz von Kutschera,et al.  Causation , 1993, J. Philos. Log..

[106]  N. Roese,et al.  The Functional Theory of Counterfactual Thinking , 2008, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[107]  Takashi Washio,et al.  Error asymmetry in causal and anticausal regression , 2016, Behaviormetrika.

[108]  Sriram Vishwanath,et al.  Learning Representations by Maximizing Mutual Information in Variational Autoencoders , 2019, 2020 IEEE International Symposium on Information Theory (ISIT).

[109]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[110]  Elias Bareinboim,et al.  Fairness in Decision-Making - The Causal Explanation Formula , 2018, AAAI.

[111]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[112]  Bernhard Schölkopf,et al.  Semi-supervised learning, causality, and the conditional cluster assumption , 2019, UAI.

[113]  Henry S. Baird,et al.  Document image defect models , 1995 .

[114]  Bernhard Scholkopf Causality for Machine Learning , 2019 .

[115]  Chuang Gan,et al.  CLEVRER: CoLlision Events for Video REpresentation and Reasoning , 2020, ICLR.

[116]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[117]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[118]  Mihaela van der Schaar,et al.  Deep-Treat: Learning Optimal Personalized Treatments From Observational Data Using Neural Networks , 2018, AAAI.

[119]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[120]  Elizabeth S. Spelke,et al.  Principles of Object Perception , 1990, Cogn. Sci..

[121]  M. Rowland The Secret of Our Success. , 2008 .

[122]  Bernhard Schölkopf,et al.  Inferring deterministic causal relations , 2010, UAI.

[123]  Bernhard Schölkopf,et al.  From Deterministic ODEs to Dynamic Structural Causal Models , 2016, UAI.

[124]  T. Haavelmo,et al.  The probability approach in econometrics , 1944 .

[125]  Ragnar Frisch,et al.  Autonomy Of Economic Relations , 1948 .

[126]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[127]  Bernhard Schölkopf,et al.  Causal Discovery from Heterogeneous/Nonstationary Data , 2019, J. Mach. Learn. Res..

[128]  Rajen Dinesh Shah,et al.  The hardness of conditional independence testing and the generalised covariance measure , 2018, The Annals of Statistics.

[129]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[130]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[131]  Christopher K. I. Williams,et al.  A Framework for the Quantitative Evaluation of Disentangled Representations , 2018, ICLR.

[132]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[133]  N. McGlynn Thinking fast and slow. , 2014, Australian veterinary journal.

[134]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[135]  Alexander D'Amour,et al.  On Robustness and Transferability of Convolutional Neural Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[136]  Bernhard Schölkopf,et al.  Causal Discovery from Temporally Aggregated Time Series , 2017, UAI.

[137]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[138]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[139]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[140]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[141]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[142]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[143]  Mihaela van der Schaar,et al.  Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders , 2019, ICML.

[144]  Kevin P. Murphy,et al.  Exact Bayesian structure learning from uncertain interventions , 2007, AISTATS.

[145]  Nicolai Meinshausen,et al.  CAUSALITY FROM A DISTRIBUTIONAL ROBUSTNESS POINT OF VIEW , 2018, 2018 IEEE Data Science Workshop (DSW).

[146]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[147]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[148]  Bernhard Schölkopf,et al.  Telling cause from effect based on high-dimensional observations , 2009, ICML.

[149]  Edward Vul,et al.  Pure Reasoning in 12-Month-Old Infants as Probabilistic Inference , 2011, Science.

[150]  Daniel Thalmann,et al.  Autonomy , 2005, SIGGRAPH Courses.

[151]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[152]  Jin Tian,et al.  Causal Discovery from Changes , 2001, UAI.

[153]  Editors , 2003 .

[154]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[155]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[156]  Stefan Bauer,et al.  On the Fairness of Disentangled Representations , 2019, NeurIPS.

[157]  Ruocheng Guo,et al.  A Survey of Learning Causality with Data , 2018, ACM Comput. Surv..

[158]  N. Roese The Functional Basis of Counterfactual Thinking , 1994 .

[159]  Blai Bonet,et al.  Learning First-Order Symbolic Representations for Planning from the Structure of the State Space , 2020, ECAI.

[160]  Suchi Saria,et al.  Reliable Decision Support using Counterfactual Models , 2017, NIPS.

[161]  Kailash Budhathoki,et al.  Origo: causal inference by compression , 2016, Knowledge and Information Systems.

[162]  Daniel L. K. Yamins,et al.  Flexible Neural Representation for Physics Prediction , 2018, NeurIPS.

[163]  Bernhard Schölkopf,et al.  Discovering Causal Signals in Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[164]  Aaron C. Courville,et al.  Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[165]  Peter V. Gehler,et al.  Towards causal generative scene models via competition of experts , 2020, ArXiv.

[166]  Luigi Gresele,et al.  Simpson's Paradox in COVID-19 Case Fatality Rates: A Mediation Analysis of Age-Related Causal Effects , 2020, IEEE Transactions on Artificial Intelligence.

[167]  Zeb Kurth-Nelson,et al.  Causal Reasoning from Meta-reinforcement Learning , 2019, ArXiv.

[168]  Bernhard Schölkopf,et al.  Artificial intelligence: Learning to see and act , 2015, Nature.

[169]  Klaus Greff,et al.  On the Binding Problem in Artificial Neural Networks , 2020, ArXiv.

[170]  David M. Sobel,et al.  A theory of causal learning in children: causal maps and Bayes nets. , 2004, Psychological review.

[171]  Wei Chen,et al.  Learning to predict the cosmological structure formation , 2018, Proceedings of the National Academy of Sciences.

[172]  Aapo Hyvarinen,et al.  Hidden Markov Nonlinear ICA: Unsupervised Learning from Nonstationary Time Series , 2020, UAI.

[173]  Tyler Lu,et al.  Impossibility Theorems for Domain Adaptation , 2010, AISTATS.

[174]  Mihaela van der Schaar,et al.  GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets , 2018, ICLR.

[175]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[176]  Bernhard Schölkopf,et al.  Behind Distribution Shift: Mining Driving Forces of Changes and Causal Arrows , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[177]  Bernhard Schölkopf,et al.  Improving the accuracy and speed of support vector learning machines , 1997, NIPS 1997.

[178]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[179]  Ole Winther,et al.  On the Transfer of Disentangled Representations in Realistic Settings , 2020, ArXiv.

[180]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[181]  Nan Rosemary Ke,et al.  Learning Neural Causal Models from Unknown Interventions , 2019, ArXiv.

[182]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[183]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[184]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[185]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[186]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[187]  Chen Sun,et al.  Stochastic Prediction of Multi-Agent Interactions from Partial Observations , 2019, ICLR.

[188]  Bernhard Schölkopf,et al.  Causal Consistency of Structural Equation Models , 2017, UAI.

[189]  John R. Slate,et al.  Reflective learning: the use of “if only ...” statements to improve performance , 1999 .

[190]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[191]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[192]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[193]  Joelle Pineau,et al.  Independently Controllable Features , 2017 .

[194]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[195]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[196]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[197]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[198]  Boris Katz,et al.  ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[199]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[200]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[201]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[202]  Bernhard Schölkopf,et al.  On Causal Discovery with Cyclic Additive Noise Models , 2011, NIPS.

[203]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[204]  Pietro Perona,et al.  Multi-Level Cause-Effect Systems , 2015, AISTATS.

[205]  B. Schoelkopf,et al.  Algorithmic independence of initial condition and dynamical law in thermodynamics and causal inference , 2015, 1512.02057.

[206]  Yair Weiss,et al.  Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[207]  Lauren Wilcox,et al.  A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy , 2020, CHI.

[208]  H. Simon,et al.  Causal Ordering and Identifiability , 1977 .

[209]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[210]  Bernhard Schölkopf,et al.  Domain Adaptation with Conditional Transferable Components , 2016, ICML.

[211]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[212]  Bernhard Schölkopf,et al.  Causal Discovery from Nonstationary/Heterogeneous Data: Skeleton Estimation and Orientation Determination , 2017, IJCAI.

[213]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[214]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[215]  Aapo Hyvärinen,et al.  Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.

[216]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[217]  Bernhard Schölkopf,et al.  Causal Inference by Choosing Graphs with Most Plausible Markov Kernels , 2006, AI&M.

[218]  Sebastian Weichwald,et al.  Pragmatism and Variable Transformations in Causal Modelling , 2019 .

[219]  Stefan Bauer,et al.  Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness , 2018, ICML.

[220]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[221]  Tonio Ball,et al.  Causal and anti-causal learning in pattern recognition for neuroimaging , 2015, 2014 International Workshop on Pattern Recognition in Neuroimaging.

[222]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[223]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[224]  Yining Chen,et al.  Weakly Supervised Disentanglement with Guarantees , 2020, ICLR.

[225]  Saumik Bhattacharya,et al.  Effects of Degradations on Deep Neural Network Architectures , 2018, ArXiv.

[226]  K. Hoover,et al.  Causality in Economics and Econometrics , 2006 .

[227]  Bernhard Schölkopf,et al.  A theory of independent mechanisms for extrapolation in generative models , 2020, AAAI.

[228]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[229]  Christina Heinze-Deml,et al.  Conditional variance penalties and domain shift robustness , 2017, Machine Learning.

[230]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[231]  Bernhard Schölkopf,et al.  Semi-supervised interpolation in an anticausal learning scenario , 2015, J. Mach. Learn. Res..

[232]  Quoc V. Le,et al.  Using Videos to Evaluate Image Model Robustness , 2019, ArXiv.

[233]  Jos'e Miguel Hern'andez-Lobato,et al.  Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation , 2020, ArXiv.

[234]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[235]  Aapo Hyvärinen,et al.  Nonlinear ICA of Temporally Dependent Stationary Sources , 2017, AISTATS.

[236]  Sergey Levine,et al.  Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[237]  Pietro Perona,et al.  Fast Conditional Independence Test for Vector Variables with Large Sample Sizes , 2018, ArXiv.

[238]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[239]  Bernhard Schölkopf,et al.  Recurrent Independent Mechanisms , 2021, ICLR.

[240]  Sjoerd van Steenkiste,et al.  Are Disentangled Representations Helpful for Abstract Visual Reasoning? , 2019, NeurIPS.

[241]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[242]  Matthew Botvinick,et al.  MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[243]  Ilya Sutskever,et al.  Jukebox: A Generative Model for Music , 2020, ArXiv.

[244]  Bernhard Schölkopf,et al.  Modeling confounding by half-sibling regression , 2016, Proceedings of the National Academy of Sciences.

[245]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[246]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.

[247]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[248]  Isabelle Guyon,et al.  Causality : Objectives and Assessment , 2010 .

[249]  Georg Martius,et al.  Variational Autoencoders Pursue PCA Directions (by Accident) , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[250]  John Essington,et al.  How we learn: why brains learn better than any machine … for now , 2021, Educational Review.

[251]  Taesup Kim,et al.  Fast AutoAugment , 2019, NeurIPS.

[252]  Eric P. Xing,et al.  Learning Robust Representations by Projecting Superficial Statistics Out , 2018, ICLR.

[253]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[254]  Francesco Locatello,et al.  Is Independence all you need? On the Generalization of Representations Learned from Correlated Data , 2020, ArXiv.

[255]  Sungjin Ahn,et al.  SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition , 2020, ICLR.

[256]  Martin A. Riedmiller,et al.  Batch Reinforcement Learning , 2012, Reinforcement Learning.

[257]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[258]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[259]  Bernhard Schölkopf,et al.  Algorithmic recourse under imperfect causal knowledge: a probabilistic approach , 2020, NeurIPS.

[260]  Bernhard Schölkopf,et al.  Consistency of Causal Inference under the Additive Noise Model , 2013, ICML.

[261]  Yoshua Bengio,et al.  CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning , 2020, ICLR.

[262]  Mihaela van der Schaar,et al.  Limits of Estimating Heterogeneous Treatment Effects: Guidelines for Practical Algorithm Design , 2018, ICML.

[263]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[264]  I. Guyon,et al.  Causal Generative Neural Networks , 2017, 1711.08936.

[265]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[266]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[267]  Stephan Günnemann,et al.  Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift , 2018, NeurIPS.

[268]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[269]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[270]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[271]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[272]  Alexander S. Ecker,et al.  Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , 2019, ArXiv.

[273]  Barbara Solenthaler,et al.  Data-driven fluid simulations using regression forests , 2015, ACM Trans. Graph..

[274]  Bernhard Schölkopf,et al.  Counterfactuals uncover the modular structure of deep generative models , 2018, ICLR.

[275]  Bernhard Schölkopf,et al.  Towards a Learning Theory of Causation , 2015, 1502.02398.

[276]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[277]  Bernhard Schölkopf,et al.  Multi-Source Domain Adaptation: A Causal View , 2015, AAAI.