Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: review and recommendations

Abstract Background Directed acyclic graphs (DAGs) are an increasingly popular approach for identifying confounding variables that require conditioning when estimating causal effects. This review examined the use of DAGs in applied health research to inform recommendations for improving their transparency and utility in future research. Methods Original health research articles published during 1999–2017 mentioning ‘directed acyclic graphs’ (or similar) or citing DAGitty were identified from Scopus, Web of Science, Medline and Embase. Data were extracted on the reporting of: estimands, DAGs and adjustment sets, alongside the characteristics of each article’s largest DAG. Results A total of 234 articles were identified that reported using DAGs. A fifth (n = 48, 21%) reported their target estimand(s) and half (n = 115, 48%) reported the adjustment set(s) implied by their DAG(s). Two-thirds of the articles (n = 144, 62%) made at least one DAG available. DAGs varied in size but averaged 12 nodes [interquartile range (IQR): 9–16, range: 3–28] and 29 arcs (IQR: 19–42, range: 3–99). The median saturation (i.e. percentage of total possible arcs) was 46% (IQR: 31–67, range: 12–100). 37% (n = 53) of the DAGs included unobserved variables, 17% (n = 25) included ‘super-nodes’ (i.e. nodes containing more than one variable) and 34% (n = 49) were visually arranged so that the constituent arcs flowed in the same direction (e.g. top-to-bottom). Conclusion There is substantial variation in the use and reporting of DAGs in applied health research. Although this partly reflects their flexibility, it also highlights some potential areas for improvement. This review hence offers several recommendations to improve the reporting and use of DAGs in future research.

[1]  Vinny Davies,et al.  Reflection on modern methods: generalized linear models for prognosis and intervention—theory, practice and implications for machine learning , 2020, International journal of epidemiology.

[2]  Mark McCann,et al.  Evidence synthesis for constructing directed acyclic graphs (ESC-DAGs): a novel and systematic method for building directed acyclic graphs , 2019, International journal of epidemiology.

[3]  Tony Blakely,et al.  Reflection on modern methods: when worlds collide-prediction, machine learning and causal inference. , 2019, International journal of epidemiology.

[4]  T. VanderWeele Principles of confounder selection , 2019, European Journal of Epidemiology.

[5]  John Hsu,et al.  A Second Chance to Get Causal Inference Right: A Classification of Data Science Tasks , 2018, CHANCE.

[6]  Miguel A Hernán,et al.  The C-Word: Scientific Euphemisms Do Not Improve Causal Inference From Observational Data , 2018, American journal of public health.

[7]  Georg Heinze,et al.  Variable selection – A review and recommendations for the practicing statistician , 2018, Biometrical journal. Biometrische Zeitschrift.

[8]  A. Deaton,et al.  Understanding and Misunderstanding Randomized Controlled Trials , 2016, Social science & medicine.

[9]  Gunn Marit Aasvang,et al.  Road traffic noise and children’s inattention , 2017, Environmental Health.

[10]  Gunn Marit Aasvang,et al.  Road traffic noise and registry based use of sleep medication , 2017, Environmental Health.

[11]  J. Shaw,et al.  Change in Use of Sleep Medications After Gastric Bypass Surgery or Intensive Lifestyle Treatment in Adults with Obesity , 2017, Obesity.

[12]  Raquel Urtasun,et al.  Directed Acyclic Graphs , 2017, Encyclopedia of GIS.

[13]  Maciej Liskiewicz,et al.  Robust causal inference using Directed Acyclic Graphs: the R package , 2018 .

[14]  Nancy Krieger,et al.  The tale wagged by the DAG: broadening the scope of causal inference and explanation for epidemiology. , 2016, International journal of epidemiology.

[15]  A. Pottegård,et al.  Association of Pre‐Pregnancy Body Mass Index, Pregnancy‐Related Weight Changes, and Parity With the Risk of Developing Degenerative Musculoskeletal Conditions , 2015, Arthritis & rheumatology.

[16]  Alex Broadbent,et al.  Causality and causal inference in epidemiology: the need for a pluralistic approach , 2016, International journal of epidemiology.

[17]  H. Raymond,et al.  The Impact of Homophobia and HIV Stigma on HIV Testing Uptake Among Chinese Men Who Have Sex With Men: a Mediation Analysis , 2016, Journal of acquired immune deficiency syndromes.

[18]  M. Rutegård,et al.  Current use of diverting stoma in anterior resection for cancer: population-based cohort study of total and partial mesorectal excision , 2016, International Journal of Colorectal Disease.

[19]  E. Rehfuess,et al.  Cooking and Season as Risk Factors for Acute Lower Respiratory Infections in African Children: A Cross-Sectional Multi-Country Analysis , 2015, PloS one.

[20]  M. Thompson,et al.  Phthalates and risk of endometriosis. , 2013, Environmental Research.

[21]  S. Greenland,et al.  The table 2 fallacy: presenting and interpreting confounder and modifier coefficients. , 2013, American journal of epidemiology.

[22]  Tyler J. VanderWeele,et al.  On the definition of a confounder , 2013, Annals of statistics.

[23]  T. VanderWeele,et al.  Use of Directed Acyclic Graphs , 2013 .

[24]  Scott R. Smith,et al.  Developing a Protocol for Observational Comparative Effectiveness Research: A User's Guide , 2013 .

[25]  Ioannis G. Tollis,et al.  DAGView: An Approach for Visualizing Large Graphs , 2012, GD.

[26]  Johannes Textor,et al.  DAGitty: a graphical tool for analyzing causal diagrams. , 2011, Epidemiology.

[27]  L. Breitling dagR: a suite of R functions for directed acyclic graphs. , 2010, Epidemiology.

[28]  A. Stang,et al.  DAG program: identifying minimal sufficient adjustment sets. , 2010, Epidemiology.

[29]  Timothy L. Lash,et al.  Applying Quantitative Bias Analysis to Epidemiologic Data , 2009, Statistics for Biology and Health.

[30]  Ian Shrier,et al.  Reducing bias through directed acyclic graphs , 2008 .

[31]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[32]  Nikola S. Nikolov,et al.  How to Layer a Directed Acyclic Graph , 2001, GD.

[33]  G. Shaw,et al.  Maternal pesticide exposure from multiple sources and selected congenital anomalies. , 1999 .