Causal Inference in Observational Data

Our aging population increasingly suffers from multiple chronic diseases simultaneously, necessitating the comprehensive treatment of these conditions. Finding the optimal set of drugs for a combinatorial set of diseases is a combinatorial pattern exploration problem. Association rule mining is a popular tool for such problems, but the requirement of health care for finding causal, rather than associative, patterns renders association rule mining unsuitable. To address this issue, we propose a novel framework based on the Rubin-Neyman causal model for extracting causal rules from observational data, correcting for a number of common biases. Specifically, given a set of interventions and a set of items that define subpopulations (e.g., diseases), we wish to find all subpopulations in which effective intervention combinations exist and in each such subpopulation, we wish to find all intervention combinations such that dropping any intervention from this combination will reduce the efficacy of the treatment. A key aspect of our framework is the concept of closed intervention sets which extend the concept of quantifying the effect of a single intervention to a set of concurrent interventions. We also evaluated our causal rule mining framework on the Electronic Health Records (EHR) data of a large cohort of patients from Mayo Clinic and showed that the patterns we extracted are sufficiently rich to explain the controversial findings in the medical literature regarding the effect of a class of cholesterol drugs on Type-II Diabetes Mellitus (T2DM).

[1]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[2]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[3]  C. Granger Some recent development in a concept of causality , 1988 .

[4]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[5]  David Heckerman,et al.  A Bayesian Approach to Learning Causal Networks , 1995, UAI.

[6]  D. Freedman From Association to Causation via Regression , 1997 .

[7]  T. Shakespeare,et al.  Observational Studies , 2003 .

[8]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[9]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[10]  V. Didelez,et al.  Judea Pearl: Causality: Models, reasoning, and inference , 2001 .

[11]  Kevin Murphy,et al.  Dynamic Bayesian Networks , 2002 .

[12]  Bart Goethals,et al.  Survey on Frequent Pattern Mining , 2003 .

[13]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[14]  Rajeev Motwani,et al.  Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[15]  David Heckerman,et al.  Bayesian Networks for Data Mining , 2004, Data Mining and Knowledge Discovery.

[16]  Gregory F. Cooper,et al.  A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships , 1997, Data Mining and Knowledge Discovery.

[17]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[18]  Gregory F. Cooper,et al.  A Theoretical Study of Y Structures for Causal Discovery , 2006, UAI.

[19]  Diane Lambert,et al.  More bang for their bucks: assessing new features for online advertisers , 2007, SKDD.

[20]  D. Hibbs On analyzing the effects of policy interventions : Box-Jenkins and Box-Tiao vs. structural equation models , 1977 .

[21]  J. Sekhon The Neyman— Rubin Model of Causal Inference and Estimation Via Matching Methods , 2008 .

[22]  Rong Ge,et al.  Evaluating online ad campaigns in a pipeline: causal models at scale , 2010, KDD.

[23]  P. Austin An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.

[24]  Additional Authors , 2011 .

[25]  P. Austin An introduction to propensity-score methods for reducing confounding in observational studies , 2011 .

[26]  S. Morgan Handbook of Causal Analysis for Social Research , 2013 .

[27]  Jiuyong Li,et al.  Mining Causal Association Rules , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[28]  R. Huupponen,et al.  Statins and the risk of developing diabetes , 2013, BMJ.

[29]  E. Stuart,et al.  Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies , 2015, Statistics in medicine.

[30]  György J. Simon,et al.  Statin Use, Diabetes Incidence and Overall Mortality in Normoglycemic and Impaired Fasting Glucose Patients , 2016, Journal of General Internal Medicine.

[31]  Jiuyong Li,et al.  From Observational Studies to Causal Rule Mining , 2015, ACM Trans. Intell. Syst. Technol..

[32]  Jiuyong Li,et al.  Causal Decision Trees , 2015, IEEE Transactions on Knowledge and Data Engineering.

[33]  E. Stuart,et al.  Estimating the effect of treatment on binary outcomes using full matching on the propensity score , 2015, Statistical methods in medical research.