Causal Regularization Using Domain Priors

Neural networks leverage both causal and correlation-based relationships in data to learn models that optimize a given performance criterion, such as classification accuracy. This results in learned models that may not necessarily reflect the true causal relationships between input and output. When domain priors of causal relationships are available at the time of training, it is essential that a neural network model maintains these relationships as causal, even as it learns to optimize the performance criterion. We propose a causal regularization method that can incorporate such causal domain priors into the network and which supports both direct and total causal effects. We show that this approach can generalize to various kinds of specifications of causal priors, including monotonicity of causal effect of a given input feature or removing a certain influence for purposes of fairness. Our experiments on eleven benchmark datasets show the usefulness of this approach in regularizing a learned neural network model to maintain desired causal effects. On most datasets, domain-prior consistent models can be obtained without compromising on accuracy.

[1]  Yoshua Bengio,et al.  Incorporating Functional Knowledge in Neural Networks , 2009, J. Mach. Learn. Res..

[2]  Shouhong Wang,et al.  Application of the Back Propagation Neural Network Algorithm with Monotonicity Constraints for Two‐Group Classification Problems* , 1993 .

[3]  Jimeng Sun,et al.  Causal Regularization , 2019, NeurIPS.

[4]  M. Kearns,et al.  Fairness in Criminal Justice Risk Assessments: The State of the Art , 2017, Sociological Methods & Research.

[5]  Joseph Sill,et al.  Monotonic Networks , 1997, NIPS.

[6]  Judea Pearl,et al.  Direct and Indirect Effects , 2001, UAI.

[7]  François Fleuret,et al.  Knowledge Transfer with Jacobian Matching , 2018, ICML.

[8]  Animesh Garg,et al.  Counterfactual Data Augmentation using Locally Factored Dynamics , 2020, NeurIPS.

[9]  Vineeth N. Balasubramanian,et al.  Neural Network Attributions: A Causal Perspective , 2019, ICML.

[10]  Andrew Guthrie Ferguson,et al.  Big Data and Predictive Reasonable Suspicion , 2014 .

[11]  Zhitang Chen,et al.  CausalVAE: Structured Causal Disentanglement in Variational Autoencoder , 2020, ArXiv.

[12]  Stefan Bauer,et al.  Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness , 2018, ICML.

[13]  T. VanderWeele Controlled Direct and Mediated Effects: Definition, Identification and Bounds , 2010, Scandinavian journal of statistics, theory and applications.

[14]  Ryan Cotterell,et al.  Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology , 2019, ACL.

[15]  Saloni Dash,et al.  Evaluating and Mitigating Bias in Image Classifiers: A Causal Perspective Using Counterfactuals , 2020, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[16]  Allan Tucker,et al.  Modeling Air Pollution, Climate, and Health Data Using Bayesian Networks: A Case Study of the English Regions , 2018 .

[17]  Shuo Yang,et al.  Knowledge Intensive Learning: Combining Qualitative Constraints with Causal Independence for Parameter Learning in Probabilistic Models , 2013, ECML/PKDD.

[18]  Guy Van den Broeck,et al.  Counterexample-Guided Learning of Monotonic Neural Networks , 2020, NeurIPS.

[19]  Lavanya Marla,et al.  How to Incorporate Monotonicity in Deep Networks While Preserving Flexibility , 2019 .

[20]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[21]  W. Thompson,et al.  Observations on mortality during the 1918 influenza pandemic. , 2001, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[22]  Maya R. Gupta,et al.  Deep Lattice Networks and Partial Monotonic Functions , 2017, NIPS.

[23]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[24]  W. Flanders,et al.  Adjusting for reverse causality in the relationship between obesity and mortality , 2008, International Journal of Obesity.

[25]  Zhitang Chen,et al.  Causal Discovery with Reinforcement Learning , 2019, ICLR.

[26]  Maya R. Gupta,et al.  Monotonic Calibrated Interpolated Look-Up Tables , 2015, J. Mach. Learn. Res..

[27]  Elias Bareinboim,et al.  Fairness in Decision-Making - The Causal Explanation Formula , 2018, AAAI.

[28]  A. Tversky,et al.  Choices, Values, and Frames , 2000 .

[29]  E. Calabrese,et al.  U-shaped dose-responses in biology, toxicology, and public health. , 2001, Annual review of public health.

[30]  D. Chokshi,et al.  Nonlinear Exposure-Outcome Associations and Public Health Policy--Reply. , 2016, JAMA.

[31]  Matt Fredrikson,et al.  Supervising Feature Influence , 2018, ArXiv.

[32]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[33]  Stanley L. Brue Retrospectives The Law of Diminishing Returns , 1993 .

[34]  Stefan Bauer,et al.  Learning Counterfactual Representations for Estimating Individual Dose-Response Curves , 2019, AAAI.

[35]  R. Berk Accuracy and Fairness for Juvenile Justice Risk Assessments , 2019, Journal of Empirical Legal Studies.

[36]  Ziyan Wu,et al.  Counterfactual Visual Explanations , 2019, ICML.

[37]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[38]  Yash Goyal,et al.  Explaining Classifiers with Causal Concept Effect (CaCE) , 2019, ArXiv.

[39]  Marina Velikova,et al.  Monotone and Partially Monotone Neural Networks , 2010, IEEE Transactions on Neural Networks.

[40]  Jean-Baptiste Denis,et al.  Bayesian Networks , 2014 .

[41]  Mihaela van der Schaar,et al.  CASTLE: Regularization via Auxiliary Causal Graph Discovery , 2020, NeurIPS.

[42]  Chandan Singh,et al.  Interpretations are useful: penalizing explanations to align neural networks with prior knowledge , 2019, ICML.

[43]  T. Choi,et al.  Bayesian networks with examples in R , 2015 .