Toward Causal Representation Learning

The two fields of machine learning and graphical causality arose and are developed separately. However, there is, now, cross-pollination and increasing interest in both fields to benefit from the advances of the other. In this article, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research. This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, that is, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities.

[1]  John Essington,et al.  How we learn: why brains learn better than any machine … for now , 2021, Educational Review.

[2]  Jos'e Miguel Hern'andez-Lobato,et al.  Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation , 2020, ArXiv.

[3]  Klaus Greff,et al.  On the Binding Problem in Artificial Neural Networks , 2020, ArXiv.

[4]  B. Schölkopf,et al.  On the Transfer of Disentangled Representations in Realistic Settings , 2020, ICLR.

[5]  Julius von Kügelgen,et al.  On the Fairness of Causal Algorithmic Recourse , 2020, AAAI.

[6]  Yoshua Bengio,et al.  CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning , 2020, ICLR.

[7]  B. Schölkopf,et al.  Learning explanations that are hard to vary , 2020, ICLR.

[8]  Ciarán M Lee,et al.  Improving the accuracy of medical diagnosis with causal machine learning , 2020, Nature Communications.

[9]  Alexander D'Amour,et al.  On Robustness and Transferability of Convolutional Neural Networks , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[11]  Yoshua Bengio,et al.  Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems , 2020, ArXiv.

[12]  Thomas Kipf,et al.  Object-Centric Learning with Slot Attention , 2020, NeurIPS.

[13]  Aapo Hyvarinen,et al.  Hidden Markov Nonlinear ICA: Unsupervised Learning from Nonstationary Time Series , 2020, UAI.

[14]  Francesco Locatello,et al.  Is Independence all you need? On the Generalization of Representations Learned from Correlated Data , 2020, ArXiv.

[15]  Bernhard Scholkopf,et al.  Structural Autoencoders Improve Representations for Generation and Transfer , 2020, ArXiv.

[16]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[17]  Julius von Kügelgen,et al.  Algorithmic recourse under imperfect causal knowledge: a probabilistic approach , 2020, NeurIPS.

[18]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[19]  Yoshua Bengio,et al.  An Analysis of the Adaptation Speed of Causal Models , 2020, AISTATS.

[20]  Julius von Kügelgen,et al.  Simpson's Paradox in COVID-19 Case Fatality Rates: A Mediation Analysis of Age-Related Causal Effects , 2020, IEEE Transactions on Artificial Intelligence.

[21]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[22]  Ilya Sutskever,et al.  Jukebox: A Generative Model for Music , 2020, ArXiv.

[23]  Peter V. Gehler,et al.  Towards causal generative scene models via competition of experts , 2020, ArXiv.

[24]  Lauren Wilcox,et al.  A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy , 2020, CHI.

[25]  Bernhard Schölkopf,et al.  A theory of independent mechanisms for extrapolation in generative models , 2020, AAAI.

[26]  Stephen C. Adams,et al.  Counterfactual Multi-Agent Reinforcement Learning with Graph Convolution Communication , 2020, ArXiv.

[27]  Jure Leskovec,et al.  Learning to Simulate Complex Physics with Graph Networks , 2020, ICML.

[28]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[29]  Ben Poole,et al.  Weakly-Supervised Disentanglement Without Compromises , 2020, ICML.

[30]  Fabio Viola,et al.  Causally Correct Partial Models for Reinforcement Learning , 2020, ArXiv.

[31]  Jonas Peters,et al.  Causal Models for Dynamical Systems , 2020, Probabilistic and Causal Inference.

[32]  Sungjin Ahn,et al.  SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition , 2020, ICLR.

[33]  S. Gelly,et al.  Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.

[34]  Sriram Vishwanath,et al.  Learning Representations by Maximizing Mutual Information in Variational Autoencoders , 2019, 2020 IEEE International Symposium on Information Theory (ISIT).

[35]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[36]  S. Gelly,et al.  Self-Supervised Learning of Video-Induced Visual Invariances , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  B. Schölkopf,et al.  Causality for Machine Learning , 2019, Probabilistic and Causal Inference.

[38]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[39]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[41]  Ben Poole,et al.  Weakly Supervised Disentanglement with Guarantees , 2019, ICLR.

[42]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[43]  J. Tenenbaum,et al.  CLEVRER: CoLlision Events for Video REpresentation and Reasoning , 2019, ICLR.

[44]  Nan Rosemary Ke,et al.  Learning Neural Causal Models from Unknown Interventions , 2019, ArXiv.

[45]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[46]  Hector Geffner,et al.  Learning First-Order Symbolic Representations for Planning from the Structure of the State Space , 2019, ECAI.

[47]  Jonathan Tennyson,et al.  Water vapour in the atmosphere of the habitable-zone eight-Earth-mass planet K2-18 b , 2019, Nature Astronomy.

[48]  Peter R. McCullough,et al.  Water Vapor on the Habitable-Zone Exoplanet K2-18b , 2019 .

[49]  Alexander S. Ecker,et al.  Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , 2019, ArXiv.

[50]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[51]  Jimeng Sun,et al.  Causal Regularization , 2019, NeurIPS.

[52]  Aaron van den Oord,et al.  Shaping Belief States with Generative Environment Models for RL , 2019, NeurIPS.

[53]  Ankush Gupta,et al.  Unsupervised Learning of Object Keypoints for Perception and Control , 2019, NeurIPS.

[54]  Stefan Bauer,et al.  On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset , 2019, NeurIPS.

[55]  B. Recht,et al.  Do Image Classifiers Generalize Across Time? , 2019, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Stefan Bauer,et al.  On the Fairness of Disentangled Representations , 2019, NeurIPS.

[58]  Sjoerd van Steenkiste,et al.  Are Disentangled Representations Helpful for Abstract Visual Reasoning? , 2019, NeurIPS.

[59]  Bernhard Schölkopf,et al.  Semi-supervised learning, causality, and the conditional cluster assumption , 2019, UAI.

[60]  Bernhard Schölkopf,et al.  Optimal Decision Making Under Strategic Behavior , 2019, ArXiv.

[61]  Alexander Lerchner,et al.  COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration , 2019, ArXiv.

[62]  Bernhard Schölkopf,et al.  The Incomplete Rosetta Stone problem: Identifiability results for Multi-view Nonlinear ICA , 2019, UAI.

[63]  Taesup Kim,et al.  Fast AutoAugment , 2019, NeurIPS.

[64]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[65]  Quoc V. Le,et al.  Using Videos to Evaluate Image Model Robustness , 2019, ArXiv.

[66]  Jessica B. Hamrick,et al.  Structured agents for physical construction , 2019, ICML.

[67]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2019, ICLR.

[68]  Bernhard Schölkopf,et al.  Causal Discovery from Heterogeneous/Nonstationary Data , 2019, J. Mach. Learn. Res..

[69]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[70]  Chen Sun,et al.  Stochastic Prediction of Multi-Agent Interactions from Partial Observations , 2019, ICLR.

[71]  Mihaela van der Schaar,et al.  Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders , 2019, ICML.

[72]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[73]  Zeb Kurth-Nelson,et al.  Causal Reasoning from Meta-reinforcement Learning , 2019, ArXiv.

[74]  Matthew Botvinick,et al.  MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[75]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[76]  Bernt Schiele,et al.  Not Using the Car to See the Sidewalk — Quantifying and Controlling the Effects of Context in Classification and Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Georg Martius,et al.  Variational Autoencoders Pursue PCA Directions (by Accident) , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Suchi Saria,et al.  Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport , 2018, AISTATS.

[79]  Bernhard Schölkopf,et al.  Counterfactuals uncover the modular structure of deep generative models , 2018, ICLR.

[80]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[81]  Bernhard Schölkopf,et al.  Generalization in anti-causal learning , 2018, ArXiv.

[82]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[83]  Arvid Lundervold,et al.  An overview of deep learning in medical imaging focusing on MRI , 2018, Zeitschrift fur medizinische Physik.

[84]  Wei Chen,et al.  Learning to predict the cosmological structure formation , 2018, Proceedings of the National Academy of Sciences.

[85]  Stefan Bauer,et al.  Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness , 2018, ICML.

[86]  Stephan Günnemann,et al.  Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift , 2018, NeurIPS.

[87]  Stefan Bauer,et al.  Learning stable and predictive structures in kinetic systems , 2018, Proceedings of the National Academy of Sciences.

[88]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[89]  Aaron C. Courville,et al.  Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[90]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[91]  Eric P. Xing,et al.  Learning Robust Representations by Projecting Superficial Statistics Out , 2018, ICLR.

[92]  Nicolas Heess,et al.  Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search , 2018, ICLR.

[93]  Bernhard Schölkopf,et al.  Deconfounding Reinforcement Learning in Observational Settings , 2018, ArXiv.

[94]  D. Tao,et al.  Deep Domain Generalization via Conditional Invariant Adversarial Networks , 2018, ECCV.

[95]  Suchi Saria,et al.  Counterfactual Normalization: Proactively Addressing Dataset Shift and Improving Reliability Using Causal Mechanisms , 2018, ArXiv.

[96]  Saumik Bhattacharya,et al.  Effects of Degradations on Deep Neural Network Architectures , 2018, ArXiv.

[97]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[98]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[99]  J. Tenenbaum Building Machines that Learn and Think Like People , 2018, AAMAS.

[100]  Mihaela van der Schaar,et al.  Limits of Estimating Heterogeneous Treatment Effects: Guidelines for Practical Algorithm Design , 2018, ICML.

[101]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[102]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[103]  Daniel L. K. Yamins,et al.  Flexible Neural Representation for Physics Prediction , 2018, NeurIPS.

[104]  Nicolai Meinshausen,et al.  CAUSALITY FROM A DISTRIBUTIONAL ROBUSTNESS POINT OF VIEW , 2018, 2018 IEEE Data Science Workshop (DSW).

[105]  Srivatsan Srinivasan,et al.  Evaluating Reinforcement Learning Algorithms in Observational Health Settings , 2018, ArXiv.

[106]  Yair Weiss,et al.  Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[107]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[108]  Dacheng Tao,et al.  Domain Generalization via Conditional Invariant Representations , 2018, AAAI.

[109]  Mihaela van der Schaar,et al.  Deep-Treat: Learning Optimal Personalized Treatments From Observational Data Using Neural Networks , 2018, AAAI.

[110]  Elias Bareinboim,et al.  Fairness in Decision-Making - The Causal Explanation Formula , 2018, AAAI.

[111]  Rajen Dinesh Shah,et al.  The hardness of conditional independence testing and the generalised covariance measure , 2018, The Annals of Statistics.

[112]  Pietro Perona,et al.  Fast Conditional Independence Test for Vector Variables with Large Sample Sizes , 2018, ArXiv.

[113]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[114]  Bernhard Schölkopf,et al.  Detecting non-causal artifacts in multivariate linear regression models , 2018, ICML.

[115]  Nils Thürey,et al.  Latent Space Physics: Towards Learning the Temporal Evolution of Fluid Flow , 2018, Comput. Graph. Forum.

[116]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[117]  Mihaela van der Schaar,et al.  GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets , 2018, ICLR.

[118]  Christopher K. I. Williams,et al.  A Framework for the Quantitative Evaluation of Disentangled Representations , 2018, ICLR.

[119]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[120]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.

[121]  Alexander J. Smola,et al.  Detecting and Correcting for Label Shift with Black Box Predictors , 2018, ICML.

[122]  Michael C. Mozer,et al.  Learning Deep Disentangled Embeddings with the F-Statistic Loss , 2018, NeurIPS.

[123]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[124]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[125]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[126]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[127]  I. Guyon,et al.  Causal Generative Neural Networks , 2017, 1711.08936.

[128]  Bernhard Schölkopf,et al.  Behind Distribution Shift: Mining Driving Forces of Changes and Causal Arrows , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[129]  Christina Heinze-Deml,et al.  Conditional variance penalties and domain shift robustness , 2017, Machine Learning.

[130]  Bernhard Schölkopf,et al.  Causal Discovery from Temporally Aggregated Time Series , 2017, UAI.

[131]  Bernhard Schölkopf,et al.  Causal Discovery from Nonstationary/Heterogeneous Data: Skeleton Estimation and Orientation Determination , 2017, IJCAI.

[132]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[133]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[134]  Bernhard Schölkopf,et al.  Causal Consistency of Structural Equation Models , 2017, UAI.

[135]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[136]  J. Peters,et al.  Invariant Causal Prediction for Sequential Data , 2017, Journal of the American Statistical Association.

[137]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[138]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[139]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[140]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[141]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[142]  Bernhard Schölkopf,et al.  Group invariance principles for causal generative models , 2017, AISTATS.

[143]  Aapo Hyvärinen,et al.  Nonlinear ICA of Temporally Dependent Stationary Sources , 2017, AISTATS.

[144]  Daan Wierstra,et al.  Recurrent Environment Simulators , 2017, ICLR.

[145]  Suchi Saria,et al.  Reliable Decision Support using Counterfactual Models , 2017, NIPS.

[146]  Yoshua Bengio,et al.  Independently Controllable Features , 2017, ArXiv.

[147]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[148]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[149]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[150]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[151]  Kailash Budhathoki,et al.  Origo: causal inference by compression , 2016, Knowledge and Information Systems.

[152]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[153]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[154]  Takashi Washio,et al.  Error asymmetry in causal and anticausal regression , 2016, Behaviormetrika.

[155]  Hazim Kemal Ekenel,et al.  How Image Degradations Affect Deep CNN-Based Face Recognition? , 2016, 2016 International Conference of the Biometrics Special Interest Group (BIOSIG).

[156]  Bernhard Schölkopf,et al.  From Deterministic ODEs to Dynamic Structural Causal Models , 2016, UAI.

[157]  Bernhard Schölkopf,et al.  Modeling confounding by half-sibling regression , 2016, Proceedings of the National Academy of Sciences.

[158]  Bernhard Schölkopf,et al.  Domain Adaptation with Conditional Transferable Components , 2016, ICML.

[159]  Bernhard Schölkopf,et al.  Discovering Causal Signals in Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[160]  Stefan Bauer,et al.  The Arrow of Time in Multivariate Time Series , 2016, ICML.

[161]  B. Schölkopf,et al.  Kernel‐based tests for joint independence , 2016, 1603.00285.

[162]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[163]  Pietro Perona,et al.  Multi-Level Cause-Effect Systems , 2015, AISTATS.

[164]  Tonio Ball,et al.  Causal and anti-causal learning in pattern recognition for neuroimaging , 2015, 2014 International Workshop on Pattern Recognition in Neuroimaging.

[165]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[166]  B. Schoelkopf,et al.  Algorithmic independence of initial condition and dynamical law in thermodynamics and causal inference , 2015, 1512.02057.

[167]  Elias Bareinboim,et al.  Bandits with Unobserved Confounders: A Causal Approach , 2015, NIPS.

[168]  Barbara Solenthaler,et al.  Data-driven fluid simulations using regression forests , 2015, ACM Trans. Graph..

[169]  Sergey Levine,et al.  Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[170]  P. Pronovost,et al.  A targeted real-time early warning score (TREWScore) for septic shock , 2015, Science Translational Medicine.

[171]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[172]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[173]  John Asher Johnson,et al.  STELLAR AND PLANETARY PROPERTIES OF K2 CAMPAIGN 1 CANDIDATES AND VALIDATION OF 17 PLANETS, INCLUDING A PLANET RECEIVING EARTH-LIKE INSOLATION , 2015, 1503.07866.

[174]  Bernhard Schölkopf,et al.  Telling cause from effect in deterministic linear dynamical systems , 2015, ICML.

[175]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[176]  Bernhard Schölkopf,et al.  Artificial intelligence: Learning to see and act , 2015, Nature.

[177]  Daniel Foreman-Mackey,et al.  A SYSTEMATIC SEARCH FOR TRANSITING PLANETS IN THE K2 DATA , 2015, 1502.04715.

[178]  Bernhard Schölkopf,et al.  Towards a Learning Theory of Causation , 2015, 1502.02398.

[179]  Bernhard Schölkopf,et al.  Multi-Source Domain Adaptation: A Causal View , 2015, AAAI.

[180]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[181]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[182]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[183]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[184]  Elias Bareinboim,et al.  Transportability from Multiple Environments with Limited Experiments: Completeness Results , 2014, NIPS.

[185]  Elias Bareinboim,et al.  External Validity: From Do-Calculus to Transportability Across Populations , 2014, Probabilistic and Causal Inference.

[186]  Bernhard Schölkopf,et al.  A Permutation-Based Kernel Conditional Independence Test , 2014, UAI.

[187]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[188]  Bernhard Schölkopf,et al.  Consistency of Causal Inference under the Additive Noise Model , 2013, ICML.

[189]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[190]  B. Schölkopf,et al.  UvA-DARE ( Digital Academic Repository ) Causal Discovery with Continuous Additive Noise Models , 2014 .

[191]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[192]  J. Mooij,et al.  From Ordinary Differential Equations to Structural Causal Models: the deterministic case , 2013, UAI.

[193]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[194]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[195]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2012, J. Mach. Learn. Res..

[196]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[197]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[198]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[199]  Bernhard Schölkopf,et al.  On Causal Discovery with Cyclic Additive Noise Models , 2011, NIPS.

[200]  Bernhard Schölkopf,et al.  Identifiability of Causal Graphs using Functional Models , 2011, UAI.

[201]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[202]  Bernhard Schölkopf,et al.  Inferring deterministic causal relations , 2010, UAI.

[203]  Tyler Lu,et al.  Impossibility Theorems for Domain Adaptation , 2010, AISTATS.

[204]  Bastian Steudel,et al.  Causal Markov condition for submodular information measures , 2010, 1002.4020.

[205]  Bernhard Schölkopf,et al.  Telling cause from effect based on high-dimensional observations , 2009, ICML.

[206]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[207]  Bernhard Schölkopf,et al.  Identifying confounders using additive noise models , 2009, UAI.

[208]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[209]  Bernhard Schölkopf,et al.  Regression by dependence minimization and its application to causal inference in additive noise models , 2009, ICML '09.

[210]  Isabelle Guyon,et al.  Causality : Objectives and Assessment , 2010 .

[211]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[212]  M. Rowland The Secret of Our Success. , 2008 .

[213]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[214]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[215]  N. Roese,et al.  The Functional Theory of Counterfactual Thinking , 2008, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[216]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[217]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[218]  Kevin P. Murphy,et al.  Exact Bayesian structure learning from uncertain interventions , 2007, AISTATS.

[219]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[220]  K. Hoover,et al.  Causality in Economics and Econometrics , 2006 .

[221]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[222]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[223]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[224]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[225]  Jin Tian,et al.  Causal Discovery from Changes , 2001, UAI.

[226]  R. Matthews Storks Deliver Babies (p= 0.008) , 2000 .

[227]  John R. Slate,et al.  Reflective learning: the use of “if only ...” statements to improve performance , 1999 .

[228]  Aapo Hyvärinen,et al.  Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.

[229]  Geoffrey E. Hinton,et al.  NeuroAnimator: fast neural network emulation and control of physics-based models , 1998, SIGGRAPH.

[230]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[231]  Bernhard Schölkopf,et al.  Improving the accuracy and speed of support vector learning machines , 1997, NIPS 1997.

[232]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[233]  Abigail J. Stewart,et al.  Missed opportunities: Psychological ramifications of counterfactual thought in midlife women , 1995 .

[234]  N. Roese The Functional Basis of Counterfactual Thinking , 1994 .

[235]  G. Lugosi,et al.  Strong Universal Consistency of Neural Network Classifiers , 1993, Proceedings. IEEE International Symposium on Information Theory.

[236]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[237]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[238]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[239]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[240]  Dan Geiger,et al.  Logical and algorithmic properties of independence and their application to Bayesian networks , 1990, Annals of Mathematics and Artificial Intelligence.

[241]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[242]  Daniel Thalmann,et al.  Autonomy , 2005, SIGGRAPH Courses.

[243]  O. Penrose The Direction of Time , 1962 .

[244]  T. Haavelmo,et al.  The probability approach in econometrics , 1944 .

[245]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[246]  Elias Bareinboim,et al.  Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes , 2019, NeurIPS.

[247]  Boris Katz,et al.  ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[248]  Sebastian Weichwald,et al.  Pragmatism and Variable Transformations in Causal Modelling , 2019 .

[249]  P. R. Hahn,et al.  A Survey of Learning Causality with Data: Problems and Methods , 2018, ArXiv.

[250]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[251]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[252]  Bernhard Schölkopf,et al.  Semi-supervised interpolation in an anticausal learning scenario , 2015, J. Mach. Learn. Res..

[253]  Martin A. Riedmiller,et al.  Batch Reinforcement Learning , 2012, Reinforcement Learning.

[254]  P. Cheng,et al.  Causal Learning , 2012 .

[255]  J. Tenenbaum,et al.  Pure reasoning in 12-month-old infants as probabilistic inference. , 2011, Science.

[256]  Bernhard Schölkopf,et al.  Causal Inference by Choosing Graphs with Most Plausible Markov Kernels , 2006, AI&M.

[257]  David M. Sobel,et al.  A theory of causal learning in children: causal maps and Bayes nets. , 2004, Psychological review.

[258]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[259]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[260]  Henry S. Baird,et al.  Document image defect models , 1995 .

[261]  Franz von Kutschera,et al.  Causation , 1993, J. Philos. Log..

[262]  Elizabeth S. Spelke,et al.  Principles of Object Perception , 1990, Cogn. Sci..

[263]  Anthony Zee,et al.  Die Rückseite des Spiegels , 1990 .

[264]  H. Simon,et al.  Causal Ordering and Identifiability , 1977 .

[265]  Ragnar Frisch,et al.  Autonomy Of Economic Relations , 1948 .