Limitations and Misinterpretations of E-Values for Sensitivity Analyses of Observational Studies

Residual confounding is a major threat for observational studies (1, 2). Most such studies do mention confounding (3), but many just make statements arbitrarily claiming that their results are robust to potential confounding (3). A rich literature already exists on sensitivity analyses for confounder evaluation (411), and many epidemiologic studies have used these methods. However, there is still much room for improvement in the handling of confounding for the average observational study. On the basis of earlier work, VanderWeele and Ding (12) recently proposed a standardized approach to sensitivity analyses for confounding. They introduced the E-value, defined as the minimum strength of associationthat an unmeasured confounder would need to have with both the treatment and the outcome to fully explain away a specific treatmentoutcome association, conditional on the measured covariates (12). The authors proposed wide application of E-values in observational studies to improve evaluation of causality and strengthen science (12). The E-value is a smart, ingenious heuristic that aims to address a serious, recalcitrant problem. However, we fear that it has major conceptual and validity problems and could eventually facilitate and automate misleading claims. Instead of potentially wide misuse of E-values, we propose a systematic, nuanced approach to discussion of confounding in observational studies. Conceptual and Interpretation Problems E-Values Have a Monotonic, Almost Linear Relationship With Effect Estimates, Thus Offering No Additional Information Beyond What Effect Estimates Can Convey The formula for calculating the E-value (12) is a simple function of the effect estimate. For example, for risk ratio (RR), the formula is: As observed by Localio and colleagues (13), the E-value is almost linearly related to the absolute value of the effect estimate and the relationship is monotonic: The more the effect estimate deviates from the null, the larger the E-value. A given effect estimate always produces the same E-valuefor example, an RR of 3.9 always produces an E-value of 7.26. Whereas Effect Estimates Are Based on Real Data, E-Values May Make Unrealistic Assumptions Although E-values and effect estimates are monotonically related and potentially interchangeable, they differ in meaning, relevance, and relationship to reality. Effect estimates refer to the research question of interest and are obtained from real data. Data and analyses may well be biased, but effect estimates convey what researchers observed, right or wrong. In contrast, E-values are sensitivity analyses that are hypothetical and potentially unrealistic. The E-value has been presented as sensitivity analysis without assumptions (14)but it does in fact make assumptions, and they are often untenable. We fear that the assumption that the unmeasured confounder is equally related to exposure and outcome (12, 14) is often modestly or grossly inaccurate, on the basis of our observations about associations between variables in epidemiology (15). Whether unaccounted-for confounders are single, few, or many is typically unknown. The retort that this is just a heuristic aiming to capture the composite effect of multiple unmeasured confounders offers little reassurance. Conceptually, the rationale for a single variable that captures the influence of multiple confounders is close to that guiding the use of propensity scores and disease risk scores. For example, in the numerical example given earlier, it sounds reassuringly unlikely that a single confounder could have an RR of 7.26 in its association with both the exposure and the outcome. However, if dozens of unknown confounders exist, such a composite effect might not be totally implausible, even if each confounder's strength of association is modest. No Guidance Exists on What Is a Small Enough E-Value The original article introducing E-values (12) wisely avoided giving specific guidance on the range within which an E-value should be deemed small (and thus residual confounding is a serious threat) and where it becomes large enough that a researcher need not worry about residual confounding in different contexts. Users of the biomedical literature can have some sense about what constitutes large, modest, and small effect estimates. Effect estimates can be assessed on the basis of their perceived clinical meaning, whereas E-values are operating on a scale with which users are unfamiliar. We have extensive, century-long experience of what effect estimates look like in different fields and which effect estimates tend to be validated or refuted in the literature. We lack such prior insights on E-values. Problems Arise With Dependence on Effect Estimates and CIs All metrics depending on effect estimates, not just E-values, have some conceptual caveats. Strength of association is 1 of the 9 considerations for causality (16). Hill (17) was very cautious to avoid claiming that any of these considerations offer absolute proof of causality. Large effects, like those for smoking with lung cancer, are now uncommon (18), whereas small or tiny effects (19) have become more commonoverwhelmingly so in such fields as genetic epidemiology. When practicing epidemiologists in omics fields come across large effect estimates, our first thought is of undetected errors. We worry that large effect estimates (and thus large E-values) sometimes reflect errors and biases, particularly in the presence of selective analyses and outcomes reporting. Moreover, effect estimates depend on the exposure contrast; empirical evaluation (20) has shown that when effects are genuinely small, investigators choose more extreme exposure contrasts to report. Almost any E-value can be obtained, depending on choice of exposure contrast. Finally, VanderWeele and Ding proposed that E-values be calculated for both the point estimate and the limit of the 95% CI closest to the null (12). This requires another arbitrary choice about CI level; such choices also affect other types of sensitivity analyses for confounding. The Automation of E-Values May Give an Excuse Not to Think Seriously About Confounding Calculation of an E-value by itself does not remove residual confounding and does not offer any insights about what the unmeasured, unknown confounders might be. It does not even suggest known but unaccounted-for confounders. Most epidemiologic studies do not discuss the implications of confounding in depth (3). E-values would be interpretable only in the context of each specific study after careful consideration of its design features. Automated methodological tools have been misused because they allow investigators to say that by running them they gain some statistical aura (for example, P values) or exorcize the influence of bias (for example, publication bias tests). We fear that E-values may be misused to offer false reassurance that residual confounding is not a concern. Subjective interpretations that dismiss the threat of confounding may arise when investigators are biased to defend their discoveries. The opposite scenario may also be seen: Some investigators may be biased to show that an effect is not causal or important and may use the E-value to conclude that confounding caused the effect. Other Biases May Still Undermine the Results Of note, we worry that E-values may offer a spurious shortcut for arbitrating on causality while ignoring many other major biases (beyond confounding) that are important to consider in evaluating causation. As acknowledged by VanderWeele and Ding (12, 14) and in the editorial by Localio and colleagues (13), measurement error, attrition bias, and selective reporting are major threats for observational associations. Given the exploratory nature of much observational research and the lack of preregistration of protocols and analyses, selective reporting bias may be more influential than confounding for most observational epidemiology. These additional biases are currently impossible to tackle for most epidemiologic publications without endorsement of open science, data sharing, and reproducible research practices (21). Conclusions and Recommendations for Handling Confounding in Observational Studies E-Values Are Problematic E-values are a simple transformation of effect estimates. We believe that they have the potential to be widely misused and do more harm than good. Instead of promoting automated, subpar solutions to a serious problem, investigators of observational studies should focus on preemptively addressing the major threats to their study's validity and carefully discussing the study's residual limitations. Studies with the same E-values can differ greatly in this regard. Although we trust that VanderWeele and Ding would not endorse misuse, we fear that the perception that a single number or 2 numbers can absolve confounding or bias in general can fertilize the ground for gross misconceptions. Different fields of observational research vary in their susceptibility to confounding (for example, nutritional vs. genetic epidemiology). E-values may weaken rather than strengthen reasoning. Instead of espousing misused E-values, researchers need to consider confounding in a systematic, thorough, and balanced way in observational epidemiologic studies. Using Existing Guidance on Handling Confounding and Reporting Excellent literature (411) describes how to use sensitivity analyses for confounding; these methods can be used with careful thinking about the assumptions. E-values are 1 among many possible sensitivity analyses, and researchers should consider their potential limitations and misinterpretations (described earlier) before selecting and reporting this sensitivity analysis method. Existing guidance on reporting epidemiologic studies, such as STROBE (STrengthening the Reporting of Observational Studies in Epidemiology) (22) and its extensions to specific subfields, may also help researchers organize the presentatio

[1]  George Hripcsak,et al.  Improving reproducibility by using high-throughput observational studies with empirical calibration , 2018, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[2]  Warren W. Kretzschmar,et al.  Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression , 2017, Nature Genetics.

[3]  J. Ioannidis,et al.  Interpretation of epidemiologic studies very often lacked adequate consideration of confounding. , 2018, Journal of clinical epidemiology.

[4]  Tyler J. VanderWeele,et al.  Sensitivity Analysis in Observational Research: Introducing the E-Value , 2017, Annals of Internal Medicine.

[5]  A. Localio,et al.  Sensitivity Analysis for Unmeasured Confounding: E-Values for Observational Studies , 2017, Annals of Internal Medicine.

[6]  D. Lawlor,et al.  Sensitivity analysis for the effects of multiple unmeasured confounders. , 2016, Annals of epidemiology.

[7]  John P A Ioannidis,et al.  Exposure‐wide epidemiology: revisiting Bradford Hill , 2016, Statistics in medicine.

[8]  Tyler J. VanderWeele,et al.  Sensitivity Analysis Without Assumptions , 2015, Epidemiology.

[9]  David M. Evans,et al.  Mendelian Randomization: New Applications in the Coming Age of Hypothesis-Free Causality. , 2015, Annual review of genomics and human genetics.

[10]  Brian A. Nosek,et al.  Promoting an open research culture , 2015, Science.

[11]  William DuMouchel,et al.  Interpreting observational studies: why empirical calibration is needed to correct p-values , 2013, Statistics in medicine.

[12]  M. Schuemie,et al.  Variation in Choice of Study Design: Findings from the Epidemiology Design Decision Inventory and Evaluation (EDDIE) Survey , 2013, Drug Safety.

[13]  J. Ioannidis,et al.  Risk factors and interventions with statistically significant tiny effects. , 2011, International journal of epidemiology.

[14]  John P A Ioannidis,et al.  Researching Genetic Versus Nongenetic Determinants of Disease: A Comparison and Proposed Unification , 2009, Science Translational Medicine.

[15]  Douglas G Altman,et al.  [The Strengthening the Reporting of Observational Studies in Epidemiology [STROBE] statement: guidelines for reporting observational studies]. , 2007, Gaceta sanitaria.

[16]  S. Pocock,et al.  Strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies , 2007, BMJ : British Medical Journal.

[17]  Jonathan A C Sterne,et al.  The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study. , 2007, American journal of epidemiology.

[18]  George Liberopoulos,et al.  Selection in Reported Epidemiological Risks: An Empirical Assessment , 2007, PLoS medicine.

[19]  Abba M Krieger,et al.  Causal conclusions are most sensitive to unobserved binary covariates , 2006, Statistics in medicine.

[20]  R. Kronmal,et al.  Assessing the sensitivity of regression results to unmeasured confounders in observational studies. , 1998, Biometrics.

[21]  W. Flanders,et al.  Indirect Assessment of Confounding: Graphic Description and Limits on Effect of Adjusting for Covariates , 1990, Epidemiology.

[22]  Takashi Yanagawa,et al.  Case-control studies: Assessing the effect of a confounding factor , 1984 .

[23]  D. Rubin,et al.  Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome , 1983 .

[24]  J. Schlesselman Assessing effects of confounding variables. , 1978, American journal of epidemiology.

[25]  A. B. Hill The Environment and Disease: Association or Causation? , 1965, Proceedings of the Royal Society of Medicine.

[26]  E. C. Hammond,et al.  Smoking and lung cancer: recent evidence and a discussion of some questions. , 1959, Journal of the National Cancer Institute.