Causal Relational Learning

Causal inference is at the heart of empirical research in natural and social sciences and is critical for scientific discovery and informed decision making. The gold standard in causal inference is performing randomized controlled trials ; unfortunately these are not always feasible due to ethical, legal, or cost constraints. As an alternative, methodologies for causal inference from observational data have been developed in statistical studies and social sciences. However, existing methods critically rely on restrictive assumptions such as the study population consisting of homogeneous elements that can be represented in a single flat table, where each row is referred to as a unit. In contrast, in many real-world settings, the study domain naturally consists of heterogeneous elements with complex relational structure, where the data is naturally represented in multiple related tables. In this paper, we present a formal framework for causal inference from such relational data. We propose a declarative language called CARL for capturing causal background knowledge and assumptions, and specifying causal queries using simple Datalog-like rules. CARL provides a foundation for inferring causality and reasoning about the effect of complex interventions in relational domains. We present an extensive experimental evaluation on real relational data to illustrate the applicability of CARL in social sciences and healthcare.

[1]  David Arbour,et al.  Inferring Network Effects from Observational Data , 2016, KDD.

[2]  Geert Ridder,et al.  Measuring the Effects of Segregation in the Presence of Social Spillovers: A Nonparametric Approach , 2010 .

[3]  Marianna Mauro,et al.  Efficiency and optimal size of hospitals: Results of a systematic search , 2017, PloS one.

[4]  A. Banerjee,et al.  Thinking Small: A Review of Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty , 2016 .

[5]  Min Zhang,et al.  Reviewer bias in single- versus double-blind peer review , 2017, Proceedings of the National Academy of Sciences.

[6]  Suman Nath,et al.  Tracing data errors with view-conditioned causality , 2011, SIGMOD '11.

[7]  Elizabeth L. Ogburn,et al.  Causal Inference for Social Network Data , 2017, Journal of the American Statistical Association.

[8]  Brian J. Taylor,et al.  Learning Causal Models of Relational Domains , 2010, AAAI.

[9]  Gary King,et al.  Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference , 2007, Political Analysis.

[10]  Peter M. Aronow,et al.  Estimating Average Causal Effects Under Interference Between Units , 2013, 1305.6156.

[11]  Babak Salimi,et al.  From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back , 2014, Theory of Computing Systems.

[12]  Yannis Papakonstantinou,et al.  Hypothetical Queries in an OLAP Environment , 2000, VLDB.

[13]  C. Gross,et al.  Effect of blinded peer review on abstract acceptance. , 2006, JAMA.

[14]  Elizabeth L. Ogburn,et al.  Causal inference, social networks, and chain graphs. , 2018 .

[15]  Ignacio Mas Poor Economics - A radical rethinking of the way to fight global poverty , 2011 .

[16]  Michael E. Sobel,et al.  What Do Randomized Studies of Housing Mobility Demonstrate? , 2006 .

[17]  M. Kocher,et al.  Single-blind vs Double-blind Peer Review in the Setting of Author Prestige. , 2016, JAMA.

[18]  Leopoldo E. Bertossi,et al.  Causes for Query Answers from Databases, Datalog Abduction and View-Updates: The Presence of Integrity Constraints , 2016, FLAIRS Conference.

[19]  John R. Kitchin,et al.  pybliometrics: Scriptable bibliometrics using a Python interface to Scopus , 2019, SoftwareX.

[20]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..

[21]  Elizabeth L. Ogburn,et al.  Causal diagrams for interference , 2014, 1403.1239.

[22]  Cynthia Rudin,et al.  MALTS: Matching After Learning to Stretch , 2018, J. Mach. Learn. Res..

[23]  Dan Suciu,et al.  Tiresias: the database oracle for how-to queries , 2012, SIGMOD Conference.

[24]  Tyler J. VanderWeele,et al.  Bounding the Infectiousness Effect in Vaccine Trials , 2011, Epidemiology.

[25]  Tyler J VanderWeele,et al.  On causal inference in the presence of interference , 2012, Statistical methods in medical research.

[26]  Katerina Marazopoulou,et al.  Reasoning about Independence in Probabilistic Models of Relational Data , 2013, ArXiv.

[27]  Laks V. S. Lakshmanan,et al.  What-if OLAP Queries with Changing Dimensions , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[28]  Daniel Deutch,et al.  Caravan: Provisioning for What-If Analysis , 2013, CIDR.

[29]  Vasant Honavar,et al.  On Learning Causal Models from Relational Data , 2016, AAAI.

[30]  Paul R. Rosenbaum,et al.  Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms , 1993 .

[31]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[32]  Leopoldo E. Bertossi,et al.  Causes for query answers from databases: Datalog abduction, view-updates, and integrity constraints , 2016, Int. J. Approx. Reason..

[33]  D. Rubin For objective causal inference, design trumps analysis , 2008, 0811.1640.

[34]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[35]  Michael G Hudgens,et al.  Causal Inference for Vaccine Effects on Infectiousness , 2012, The international journal of biostatistics.

[36]  Dan Suciu,et al.  Capuchin: Causal Database Repair for Algorithmic Fairness , 2019, ArXiv.

[37]  Guy Van den Broeck,et al.  Quantifying Causal Effects on Query Answering in Databases , 2016, TaPP.

[38]  Katerina Marazopoulou,et al.  A Sound and Complete Algorithm for Learning Causal Models from Relational Data , 2013, UAI.

[39]  Cosma Rohilla Shalizi,et al.  Homophily and Contagion Are Generically Confounded in Observational Social Network Studies , 2010, Sociological methods & research.

[40]  Mélanie Frappier,et al.  The Book of Why: The New Science of Cause and Effect , 2018, Science.

[41]  P. Holland Statistics and Causal Inference , 1985 .

[42]  S. Raudenbush,et al.  Evaluating Kindergarten Retention Policy , 2006 .

[43]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[44]  Jeffrey D. Ullman,et al.  Principles of Database Systems , 1980 .

[45]  Stefano M. Iacus,et al.  cem: Software for Coarsened Exact Matching , 2009, Journal of Statistical Software.

[46]  Edward J. Tanner,et al.  Under-utilization of minimally invasive surgery in the management of endometrial cancer: A Healthcare Cost and Utilization Project-National Inpatient Sample study (HCUP-NIS) , 2015 .

[47]  Dan Suciu,et al.  A formal approach to finding explanations for database queries , 2014, SIGMOD Conference.

[48]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[49]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[50]  M. Halloran,et al.  Causal Inference in Infectious Diseases , 1995, Epidemiology.

[51]  Richard T. Snodgrass,et al.  Single- versus double-blind reviewing: an analysis of the literature , 2006, SGMD.

[52]  Angie Wade Matched Sampling for Causal Effects , 2008 .

[53]  J. Pearl Causal inference in statistics: An overview , 2009 .

[54]  Dan Suciu,et al.  Causality and Explanations in Databases , 2014, Proc. VLDB Endow..