Incorporating Causal Graphical Prior Knowledge into Predictive Modeling via Simple Data Augmentation

Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions. When a CG is available, e.g., from the domain knowledge, we can infer the conditional independence (CI) relations that should hold in the data distribution. However, it is not straightforward how to incorporate this knowledge into predictive modeling. In this work, we propose a model-agnostic data augmentation method that allows us to exploit the prior knowledge of the CI encoded in a CG for supervised machine learning. We theoretically justify the proposed method by providing an excess risk bound indicating that the proposed method suppresses overfitting by reducing the apparent complexity of the predictor hypothesis class. Using real-world data with CGs provided by domain experts, we experimentally show that the proposed method is effective in improving the prediction accuracy, especially in the small-data regime.

[1]  Claude Sammut Beam Search , 2017, Encyclopedia of Machine Learning and Data Mining.

[2]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[3]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[4]  Mihaela van der Schaar,et al.  CASTLE: Regularization via Auxiliary Causal Graph Discovery , 2020, NeurIPS.

[5]  T. Richardson,et al.  Markovian acyclic directed mixed graphs for discrete data , 2013, 1301.6624.

[6]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[7]  Uwe Einmahl,et al.  An Empirical Process Approach to the Uniform Consistency of Kernel-Type Function Estimators , 2000 .

[8]  Andrea Rotnitzky,et al.  On semiparametric inference , 2005 .

[9]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[10]  O. D. Duncan,et al.  Socioeconomic Background and Achievement. , 1974 .

[11]  E. Nadaraya On Estimating Regression , 1964 .

[12]  Mihaela van der Schaar,et al.  Improving Model Robustness Using Causal Knowledge , 2019, ArXiv.

[13]  Kun Zhang,et al.  Learning Causal Structures Using Regression Invariance , 2017, NIPS.

[14]  Ezequiel Smucler,et al.  Efficient Adjustment Sets for Population Average Causal Treatment Effect Estimation in Graphical Models , 2020, J. Mach. Learn. Res..

[15]  Fe b 20 20 On efficient adjustment in causal graphs February 18 , 2020 , 2020 .

[16]  Animesh Garg,et al.  Counterfactual Data Augmentation using Locally Factored Dynamics , 2020, NeurIPS.

[17]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[18]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[19]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[20]  M. Maathuis,et al.  Graphical criteria for efficient total effect estimation via adjustment in causal linear models , 2019, 1907.02435.

[21]  Joris M. Mooij,et al.  Cyclic Causal Discovery from Continuous Equilibrium Data , 2013, UAI.

[22]  Lajos Horváth,et al.  Asymptotics of conditional empirical processes , 1988 .

[23]  Winfried Stute,et al.  Conditional empirical processes , 1986 .

[24]  T. Richardson Markov Properties for Acyclic Directed Mixed Graphs , 2003 .

[25]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[26]  Aapo Hyvärinen,et al.  DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model , 2011, J. Mach. Learn. Res..

[27]  Li Yao,et al.  A pooling-LiNGAM algorithm for effective connectivity analysis of fMRI data , 2014, Front. Comput. Neurosci..

[28]  Kui Yu,et al.  Causality-based Feature Selection: Methods and Evaluations , 2019 .

[29]  Uwe Einmahl,et al.  Uniform in bandwidth consistency of kernel-type function estimators , 2005 .

[30]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[31]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[32]  Jiji Zhang,et al.  On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias , 2008, Artif. Intell..

[33]  Alejandro Pazos Sierra,et al.  Encyclopedia of Artificial Intelligence , 2008 .

[34]  Elias Bareinboim,et al.  Causal Inference and Data Fusion in Econometrics , 2019, The Econometrics Journal.

[35]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[36]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[37]  Jin Tian,et al.  Estimating Identifiable Causal Effects on Markov Equivalence Class through Double Machine Learning , 2021, ICML.

[38]  James M. Robins,et al.  Nested Markov Properties for Acyclic Directed Mixed Graphs , 2012, UAI.

[39]  Constantin F. Aliferis,et al.  Towards Principled Feature Selection: Relevancy, Filters and Wrappers , 2003 .

[40]  Semiparametric Inference For Causal Effects In Graphical Models With Hidden Variables , 2020, ArXiv.

[41]  Hao Wang,et al.  Causality-based Feature Selection , 2019, ACM Comput. Surv..

[42]  Uniform in Bandwidth Consistency of Local Polynomial Regression Function Estimators , 2006, math/0601548.

[43]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[44]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[45]  Harvey W. Kushner,et al.  Understanding basic statistics , 1980 .

[46]  Aapo Hyvärinen,et al.  Pairwise likelihood ratios for estimation of non-Gaussian structural equation models , 2013, J. Mach. Learn. Res..

[47]  Max A. Little,et al.  Causal bootstrapping , 2019, ArXiv.

[48]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[49]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[50]  Aapo Hyvärinen,et al.  Causal Discovery with General Non-Linear Relationships using Non-Linear ICA , 2019, UAI.

[51]  Yee Whye Teh,et al.  Mixed Cumulative Distribution Networks , 2010, AISTATS.

[52]  Bernhard Schölkopf,et al.  Removing systematic errors for exoplanet search via latent causes , 2015, ICML.

[53]  Thomas S. Richardson,et al.  Causal Inference in the Presence of Latent Variables and Selection Bias , 1995, UAI.

[54]  Masashi Sugiyama,et al.  Few-shot Domain Adaptation by Causal Mechanism Transfer , 2020, ICML.

[55]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[56]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[57]  Elias Bareinboim,et al.  Transportability of Causal and Statistical Relations: A Formal Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[58]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[59]  Peter J. F. Lucas,et al.  Bayesian networks in biomedicine and health-care , 2004, Artif. Intell. Medicine.

[60]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[61]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[62]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[63]  Jin Tian,et al.  A general identification condition for causal effects , 2002, AAAI/IAAI.

[64]  Jin Tian,et al.  Estimating Identifiable Causal Effects through Double Machine Learning , 2021, AAAI.