Desiderata for Representation Learning: A Causal Perspective

Representation learning constructs low-dimensional representations to summarize essential features of high-dimensional data. This learning problem is often approached by describing various desiderata associated with learned representations; e.g., that they be non-spurious, e cient, or disentangled. It can be challenging, however, to turn these intuitive desiderata into formal criteria that can be measured and enhanced based on observed data. In this paper, we take a causal perspective on representation learning, formalizing non-spuriousness and e ciency (in supervised representation learning) and disentanglement (in unsupervised representation learning) using counterfactual quantities and observable consequences of causal assertions. This yields computable metrics that can be used to assess the degree to which representations satisfy the desiderata of interest and learn non-spurious and disentangled representations from single observational datasets.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  David Duvenaud,et al.  Isolating Sources of Disentanglement in VAEs , 2018, 1802.04942.

[3]  Bernhard Schölkopf,et al.  Generalization in anti-causal learning , 2018, ArXiv.

[4]  Pascal Frossard,et al.  Graph-based Isometry Invariant Representation Learning , 2017, ICML.

[5]  Michael J. Paul,et al.  Feature Selection as Causal Inference: Experiments with Text Classification , 2017, CoNLL.

[6]  Jin Tian,et al.  Probabilities of causation: Bounds and identification , 2000, Annals of Mathematics and Artificial Intelligence.

[7]  Rajesh Ranganath,et al.  Predictive Modeling in the Presence of Nuisance-Induced Spurious Correlations , 2021, ArXiv.

[8]  Dhanya Sridhar,et al.  Adapting Text Embeddings for Causal Inference , 2020, UAI.

[9]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[10]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[11]  Zhitang Chen,et al.  Weakly Supervised Disentangled Generative Causal Representation Learning , 2020, J. Mach. Learn. Res..

[12]  Ilya Shpitser,et al.  Semi-Parametric Causal Sufficient Dimension Reduction Of High Dimensional Treatments , 2017 .

[13]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[14]  Rajesh Ranganath,et al.  Support and Invertibility in Domain-Invariant Representations , 2019, AISTATS.

[15]  Ang Li,et al.  Causes of Effects: Learning individual responses from population data , 2021, IJCAI.

[16]  Guillaume Desjardins,et al.  Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.

[17]  Chenhao Tan,et al.  Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End , 2021, AIES.

[18]  Elias Bareinboim,et al.  A Calculus for Stochastic Interventions: Causal Effect Identification and Surrogate Experiments , 2020, AAAI.

[19]  Bernhard Schölkopf,et al.  Semi-supervised Learning in Causal and Anticausal Settings , 2013, Empirical Inference.

[20]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[21]  Stefan Bauer,et al.  Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness , 2018, ICML.

[22]  Madeleine Udell,et al.  Why Are Big Data Matrices Approximately Low Rank? , 2017, SIAM J. Math. Data Sci..

[23]  David M. Blei,et al.  Frequentist Consistency of Variational Bayes , 2017, Journal of the American Statistical Association.

[24]  G. Stewart,et al.  Rank degeneracy and least squares problems , 1976 .

[25]  Charles Blundell,et al.  Representation Learning via Invariant Causal Mechanisms , 2020, ICLR.

[26]  Zhitang Chen,et al.  CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Zhao Wang,et al.  Identifying spurious correlations for robust text classification , 2020, FINDINGS.

[28]  Yue Lu,et al.  Latent aspect rating analysis without aspect keyword supervision , 2011, KDD.

[29]  Alexander D'Amour,et al.  On Multi-Cause Approaches to Causal Inference with Unobserved Counfounding: Two Cautionary Failure Cases and A Promising Alternative , 2019, AISTATS.

[30]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  J. Pearl Causal diagrams for empirical research , 1995 .

[32]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[33]  Ben Poole,et al.  Weakly-Supervised Disentanglement Without Compromises , 2020, ICML.

[34]  Huan Liu,et al.  Deep causal representation learning for unsupervised domain adaptation , 2019, ArXiv.

[35]  Adler J. Perotte,et al.  Multiple Causal Inference with Latent Confounding , 2018, ArXiv.

[36]  Rob Brekelmans,et al.  Invariant Representations without Adversarial Training , 2018, NeurIPS.

[37]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[38]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[39]  G. Imbens,et al.  Why Ask Why? Forward Causal Inference and Reverse Causal Questions , 2013 .

[40]  Daan Wierstra,et al.  Stochastic Back-propagation and Variational Inference in Deep Latent Gaussian Models , 2014, ArXiv.

[41]  Pietro Perona,et al.  Causal feature learning: an overview , 2017 .

[42]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[43]  Sebastian Nowozin,et al.  Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations , 2017, AAAI.

[44]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[45]  Abhishek Kumar,et al.  Variational Inference of Disentangled Latent Concepts from Unlabeled Observations , 2017, ICLR.

[46]  Tonio Ball,et al.  Causal and anti-causal learning in pattern recognition for neuroimaging , 2015, 2014 International Workshop on Pattern Recognition in Neuroimaging.

[47]  Justin Grimmer,et al.  Causal Inference with Latent Treatments , 2021, American Journal of Political Science.

[48]  David M. Blei,et al.  A Proxy Variable View of Shared Confounding , 2021, ICML.

[49]  David M. Blei,et al.  Using Embeddings to Correct for Unobserved Confounding , 2019, NeurIPS.

[50]  David M. Blei,et al.  Towards Clarifying the Theory of the Deconfounder , 2020, ArXiv.

[51]  Uri Shalit,et al.  Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects , 2020, ArXiv.

[52]  Pietro Perona,et al.  Visual Causal Feature Learning , 2014, UAI.

[53]  Yining Chen,et al.  Weakly Supervised Disentanglement with Guarantees , 2020, ICLR.

[54]  Han Zhao,et al.  On Learning Invariant Representations for Domain Adaptation , 2019, ICML.

[55]  Babak Salimi,et al.  Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals , 2021, SIGMOD Conference.

[56]  Bernhard Schölkopf,et al.  Semi-supervised interpolation in an anticausal learning scenario , 2015, J. Mach. Learn. Res..

[57]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[58]  Yunxiao Chen,et al.  Structured Latent Factor Analysis for Large-scale Data: Identifiability, Estimability, and Their Implications , 2017, Journal of the American Statistical Association.

[59]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[60]  Yue Lu,et al.  Latent aspect rating analysis on review text data: a rating regression approach , 2010, KDD.

[61]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[62]  Kosuke Imai,et al.  Comment: The Challenges of Multiple Causes , 2019, Journal of the American Statistical Association.

[63]  Alexander D'Amour,et al.  Counterfactual Invariance to Spurious Correlations: Why and How to Pass Stress Tests , 2021, ArXiv.

[64]  R. Scheines,et al.  Interventions and Causal Inference , 2007, Philosophy of Science.

[65]  Philippe Beaudoin,et al.  Disentangling the independently controllable factors of variation by interacting with the world , 2018, ArXiv.

[66]  Stephen E. Fienberg,et al.  Bayesian Mixed Membership Models for Soft Clustering and Classification , 2004, GfKl.

[67]  Justin Grimmer,et al.  Discovery of Treatments from Text Corpora , 2016, ACL.

[68]  Ankur Taly,et al.  Local Explanations via Necessity and Sufficiency: Unifying Theory and Practice , 2021, UAI.

[69]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[70]  Francesco Locatello,et al.  Is Independence all you need? On the Generalization of Representations Learned from Correlated Data , 2020, ArXiv.

[71]  Philippe Beaudoin,et al.  Independently Controllable Factors , 2017, ArXiv.

[72]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[73]  Richard Zemel,et al.  Environment Inference for Invariant Learning , 2021, ICML.

[74]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[75]  Haruo Hosoya,et al.  Group-based Learning of Disentangled Representations with Generalizability for Novel Contents , 2019, IJCAI.

[76]  David Blei,et al.  Invariant Representation Learning for Treatment Effect Estimation , 2020, UAI.

[77]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[78]  Aapo Hyvärinen,et al.  Variational Autoencoders and Nonlinear ICA: A Unifying Framework , 2019, AISTATS.

[79]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[80]  David M. Blei,et al.  The Blessings of Multiple Causes , 2018, Journal of the American Statistical Association.

[81]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[82]  Stefan Bauer,et al.  Disentangling Factors of Variations Using Few Labels , 2020, ICLR.

[83]  Judea Pearl,et al.  The seven tools of causal inference, with reflections on machine learning , 2019, Commun. ACM.

[84]  Kunpeng Li,et al.  Maximum Likelihood Estimation and Inference for Approximate Factor Models of High Dimension , 2016, Review of Economics and Statistics.

[85]  P. Cheng,et al.  Causal Invariance as an Essential Constraint for Creating a Causal Representation of the World , 2017 .

[86]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[87]  Alexander D'Amour,et al.  Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[88]  Kenji Fukumizu,et al.  Intact-VAE: Estimating Treatment Effects under Unobserved Confounding , 2021, ArXiv.

[89]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[90]  William Yang Wang,et al.  Disentangled Representation Learning with Wasserstein Total Correlation , 2019, ArXiv.

[91]  Adler J. Perotte,et al.  Causal Estimation with Functional Confounders , 2020, NeurIPS.

[92]  Christina Heinze-Deml,et al.  Causal Structure Learning , 2017, 1706.09141.

[93]  Alexander D'Amour,et al.  Overlap in observational studies with high-dimensional covariates , 2017, Journal of Econometrics.

[94]  Zhao Wang,et al.  Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals , 2020, AAAI.

[95]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).