Methodological Review: A review of causal inference for biomedical informatics

Causality is an important concept throughout the health sciences and is particularly vital for informatics work such as finding adverse drug events or risk factors for disease using electronic health records. While philosophers and scientists working for centuries on formalizing what makes something a cause have not reached a consensus, new methods for inference show that we can make progress in this area in many practical cases. This article reviews core concepts in understanding and identifying causality and then reviews current computational methods for inference and explanation, focusing on inference from large-scale observational data. While the problem is not fully solved, we show that graphical models and Granger causality provide useful frameworks for inference and that a more recent approach based on temporal logic addresses some of the limitations of these methods.

[1]  David Papineau,et al.  Causal Factors, Causal Inference, Causal Explanation , 1986 .

[2]  Tobias Chapman The Direction of Time. By Hans Reichenbach. Berkeley, California: University of California Press, 1972. Pp. vii, 280. Paper, $4.50. , 1973 .

[3]  J P Vandenbroucke,et al.  How to assess the external validity of therapeutic trials: a conceptual approach. , 2010, International journal of epidemiology.

[4]  A. L. Cochrane,et al.  Effectiveness and efficiency: random reflections on health services , 1972 .

[5]  L. Karhausen Causation: The elusive grail of epidemiology , 2000, Medicine, health care, and philosophy.

[6]  Jiji Zhang,et al.  Detection of Unfaithfulness and Robust Causal Inference , 2008, Minds and Machines.

[7]  Nancy Cartwright,et al.  Are RCTs the Gold Standard? , 2007 .

[8]  J. Mill A System of Logic , 1843 .

[9]  Andrew Ward Causal criteria and the problem of complex causation , 2009, Medicine, health care, and philosophy.

[10]  Mtw,et al.  Computation, causation, and discovery , 2000 .

[11]  S. Wright The Method of Path Coefficients , 1934 .

[12]  Ellery Eells Probabilistic causality: Bibliography , 1991 .

[13]  H. Reichenbach,et al.  The Direction of Time , 1959 .

[14]  Lau Caspar Thygesen,et al.  A philosophical analysis of the Hill criteria , 2005, Journal of Epidemiology and Community Health.

[15]  Bhubaneswar Mishra,et al.  An Algorithmic Enquiry Concerning Causality , 2010 .

[16]  Antoni Ligeza,et al.  Temporal causal networks for simulation and diagnosis , 1996, Proceedings of ICECCS '96: 2nd IEEE International Conference on Engineering of Complex Computer Systems (held jointly with 6th CSESAW and 4th IEEE RTAW).

[17]  Peter J. F. Lucas,et al.  A dynamic Bayesian network for diagnosing ventilator-associated pneumonia in ICU patients , 2009, Expert Syst. Appl..

[18]  Heinz W. Schmidt,et al.  A Model-Oriented Framework for Runtime Monitoring of Nonfunctional Properties , 2005, QoSA/SOQUA.

[19]  Michael Eichler,et al.  Causal inference from time series : What can be learned from granger causality? , 2008 .

[20]  P. Rothwell,et al.  Factors That Can Affect the External Validity of Randomised Controlled Trials , 2006, PLoS clinical trials.

[21]  Benjamin Kuipers,et al.  Qualitative Simulation as Causal Explanation , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[22]  Jan Lunze,et al.  An example of fault diagnosis by means of probabilistic logic reasoning , 1997 .

[23]  R. J. Hayes,et al.  Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. , 1995, JAMA.

[24]  Richard Scheines,et al.  Learning the Structure of Linear Latent Variable Models , 2006, J. Mach. Learn. Res..

[25]  Isabelle Guyon,et al.  Design and Analysis of the Causation and Prediction Challenge , 2008, WCCI Causation and Prediction Challenge.

[26]  Michael L. Johnson,et al.  Good research practices for comparative effectiveness research: analytic methods to improve causal inference from nonrandomized studies of treatment effects using secondary data sources: the ISPOR Good Research Practices for Retrospective Database Analysis Task Force Report--Part III. , 2009, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[27]  Andrew C Ward,et al.  Epidemiologic Perspectives & Innovations Open Access the Role of Causal Criteria in Causal Inferences: Bradford Hill's "aspects of Association" , 2009 .

[28]  Federica Russo,et al.  Variational Causal Claims in Epidemiology , 2009, Perspectives in biology and medicine.

[29]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[30]  J. Bryce,et al.  Evidence-based public health: moving beyond randomized trials. , 2004, American journal of public health.

[31]  J. Pearl Graphs, Causality, and Structural Equation Models , 1998 .

[32]  J Michael Gaziano,et al.  The evolution of population science: advent of the mega cohort. , 2010, JAMA.

[33]  Eileen Munro,et al.  The limitations of randomized controlled trials in predicting effectiveness. , 2010, Journal of evaluation in clinical practice.

[34]  C.S. Chao,et al.  An Automated Fault Diagnosis System Using Hierarchical Reasoning and Alarm Correlation , 1999, Proceedings 1999 IEEE Workshop on Internet Applications (Cat. No.PR00197).

[35]  Raymond Reiter,et al.  A Theory of Diagnosis from First Principles , 1986, Artif. Intell..

[36]  Andrew Ward,et al.  Addressing confounding errors when using non-experimental, observational data to make causal claims , 2006, Synthese.

[37]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[38]  Mingzhou Ding,et al.  Evaluating causal relations in neural systems: Granger causality, directed transfer function and statistical assessment of significance , 2001, Biological Cybernetics.

[39]  R. Koch,et al.  Die Aetiologie der Tuberkulose , 1932, Klinische Wochenschrift.

[40]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[41]  P. Rothwell,et al.  External validity of randomised controlled trials: “To whom do the results of this trial apply?” , 2005, The Lancet.

[42]  P. Hoyer,et al.  On Causal Discovery from Time Series Data using FCI , 2010 .

[43]  Petter N. Kolm,et al.  Investigating Causal Relationships in Stock Returns with Temporal Logic Based Methods , 2010 .

[44]  Nancy Cartwright,et al.  Evidence-based policy: what’s to be done about relevance? , 2009 .

[45]  Jon Williamson,et al.  Causality in the Sciences , 2011 .

[46]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[47]  Snigdhansu Chatterjee,et al.  Causality and pathway search in microarray time series experiment , 2007, Bioinform..

[48]  Constantin F. Aliferis,et al.  Causal Explorer: A Causal Probabilistic Network Learning Toolkit for Biomedical Discovery , 2003, METMBS.

[49]  J. Bennett,et al.  Enquiry Concerning Human Understanding , 2010 .

[50]  Jon Williamson,et al.  Interpreting Causality in the Health Sciences , 2007 .

[51]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[52]  Keith A. Markus,et al.  Making Things Happen: A Theory of Causal Explanation , 2007 .

[53]  Bud Mishra,et al.  The Temporal Logic of Token Causes , 2010, KR.

[54]  Samantha Kleinberg,et al.  A Logic for Causal Inference in Time Series with Discrete and Continuous Variables , 2011, IJCAI.

[55]  Joseph Y. Halpern,et al.  Causes and Explanations: A Structural-Model Approach. Part I: Causes , 2000, The British Journal for the Philosophy of Science.

[56]  Michael Höfler,et al.  The Bradford Hill considerations on causality: a counterfactual perspective , 2005, Emerging themes in epidemiology.

[57]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[58]  Thomas S. Richardson,et al.  Causal Inference in the Presence of Latent Variables and Selection Bias , 1995, UAI.

[59]  G. Rangarajan,et al.  Multiple Nonlinear Time Series with Extended Granger Causality , 2004 .

[60]  Bud Mishra,et al.  Multiple Testing of Causal Hypotheses , 2008 .

[61]  M. Susser Causal Thinking in the Health Sciences: Concepts and Strategies in Epidemiology , 1973 .

[62]  Peter J. F. Lucas,et al.  Dynamic Bayesian networks as prognostic models for clinical patient management , 2008, J. Biomed. Informatics.

[63]  Jianfeng Feng,et al.  Granger causality vs. dynamic Bayesian network inference: a comparative study , 2009, BMC Bioinformatics.

[64]  Marco Grzegorczyk,et al.  Non-stationary continuous dynamic Bayesian networks , 2009, NIPS.

[65]  Gregory F. Cooper,et al.  An overview of the representation and discovery of causal relationships using Bayesian networks , 1999 .

[66]  A Morabia,et al.  On the origin of Hill's causal criteria. , 1991, Epidemiology.

[67]  Gregory F. Cooper,et al.  Causal Discovery Using A Bayesian Local Causal Discovery Algorithm , 2004, MedInfo.

[68]  John K. Tsotsos,et al.  CAA: A Knowledge Based System Using Causal Knowledge to Diagnose Cardiac Rhythm Disorders , 1983, IJCAI.

[69]  Peter J. Woolf,et al.  miniTUBA: medical inference by network integration of temporal data using Bayesian analysis , 2007, Bioinform..

[70]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[71]  Jelena Savović,et al.  Methods for Causality Assessment of Adverse Drug Reactions , 2008, Drug safety.

[72]  Bud Mishra,et al.  The Temporal Logic of Causal Structures , 2009, UAI.

[73]  M Susser,et al.  What is a cause and how do we know one? A grammar for pragmatic epidemiology. , 1991, American journal of epidemiology.

[74]  Peter Szolovits,et al.  Categorical and Probabilistic Reasoning in Medical Diagnosis , 1990, Artif. Intell..

[75]  Jason Roy,et al.  Prediction Modeling Using EHR Data: Challenges, Strategies, and a Comparison of Machine Learning Approaches , 2010, Medical care.

[76]  S. Bressler,et al.  Granger Causality: Basic Theory and Application to Neuroscience , 2006, q-bio/0608035.

[77]  C. Granger Testing for causality: a personal viewpoint , 1980 .

[78]  Nicolle M Gatto,et al.  Redundant causation from a sufficient cause perspective , 2010, Epidemiologic perspectives & innovations : EP+I.

[79]  P. Suppes A Probabilistic Theory Of Causality , 1970 .

[80]  Daniel M. Hausman,et al.  Causal Relata: Tokens, Types, or Variables? , 2005 .

[81]  D A Rizzi Causal reasoning and the diagnostic process , 1994, Theoretical medicine.

[82]  Michael Joffe,et al.  Complex causal process diagrams for analyzing the health impacts of policy interventions. , 2006, American journal of public health.

[83]  M. Parascandola,et al.  Causation in epidemiology , 2001, Journal of epidemiology and community health.

[84]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[85]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[86]  Joseph Y. Halpern,et al.  Causes and explanations: A structural-model approach , 2000 .

[87]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[88]  R. Scheines,et al.  Interventions and Causal Inference , 2007, Philosophy of Science.

[89]  L. Ohno-Machado Journal of Biomedical Informatics , 2001 .

[90]  Rong Jin,et al.  On the Use of Dynamic Bayesian Networks in Reconstructing Functional Neuronal Networks from Spike Train Ensembles , 2010, Neural Computation.

[91]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[92]  Marek J. Druzdzel,et al.  Comparison of Rule-Based and Bayesian Network Approaches in Medical Diagnostic Systems , 2001, AIME.

[93]  G F Cooper,et al.  An evaluation of explanations of probabilistic inference. , 1992, Proceedings. Symposium on Computer Applications in Medical Care.

[94]  Nancy Cartwright,et al.  What is Wrong with Bayes Nets , 2001 .

[95]  S. Greenland,et al.  Causation and causal inference in epidemiology. , 2005, American journal of public health.

[96]  William J. Long,et al.  Temporal reasoning for diagnosis in a causal probabilistic knowledge base , 1996, Artif. Intell. Medicine.

[97]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[98]  I. Good A CAUSAL CALCULUS (I)* , 1961, The British Journal for the Philosophy of Science.

[99]  Carl V Phillips,et al.  The missed lessons of Sir Austin Bradford Hill , 2004, Epidemiologic perspectives & innovations : EP+I.

[100]  Bengt Jonsson,et al.  A logic for reasoning about time and reliability , 1990, Formal Aspects of Computing.

[101]  Min Zou,et al.  A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data , 2005, Bioinform..

[102]  J. Grossman,et al.  The Randomized Controlled Trial: gold standard, or merely standard? , 2005, Perspectives in biology and medicine.

[103]  Uri Lerner,et al.  Inference in Hybrid Networks: Theoretical Limits and Practical Algorithms , 2001, UAI.

[104]  Michael Eichler,et al.  Causal Reasoning in Graphical Time Series Models , 2007, UAI.

[105]  Alexander J. Hartemink,et al.  Learning Non-Stationary Dynamic Bayesian Networks , 2010, J. Mach. Learn. Res..

[106]  J. Pearl Statistics and causal inference: A review , 2003 .

[107]  Gregory F. Cooper,et al.  NESTOR: A Computer-Based Medical Diagnostic Aid That Integrates Causal and Probabilistic Knowledge. , 1984 .

[108]  D. Greenblatt,et al.  A method for estimating the probability of adverse drug reactions , 1981, Clinical pharmacology and therapeutics.

[109]  D A Rizzi,et al.  Causality in medicine: Towards a theory and terminology , 1992, Theoretical medicine.

[110]  Kevin B. Korb,et al.  Epidemiological data mining of cardiovascular Bayesian networks , 2006 .

[111]  A. B. Hill The Environment and Disease: Association or Causation? , 1965, Proceedings of the Royal Society of Medicine.

[112]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .