Statistical models are often used in medicine and public health when there are important gaps in a body of empirical evidence regarding the impact of interventions on health outcomes. The models generally incorporate multiple parameters and variables with uncertain values. For example, lacking firm evidence that stage at diagnosis is a valid surrogate for a health outcome such as mortality, statistical models produce projections based on a chain of assumptions. Health outcomes are often projected beyond available evidence from clinical trials, perhaps years or decades into the future—a classic “out of sample” problem (3). Such modeling requires assumptions, many of which are unobserved or even unobservable, such as progression rates of preclinical biological processes. In this issue of the Journal, a team of very experienced modelers tackles an important question: What are the benefits and harms of mammography screening after the age of 74 years? (4) They conclude that the balance of benefits and harms of routine screening mammography is likely to remain positive until about age 90 years. To reach this conclusion, the authors employ three complex statistical microsimulation models, a necessity given that the well of reliable empirical evidence from randomized trials runs dry beyond the age of 74 years (5). The average reader will lack the time, patience, or skill to dissect the three models or their underlying assumptions, and so many will have to take on faith the model outputs emphasized in the abstract, despite the recognition by most modelers that identifying and studying the uncertainties in the assumptions that drive the output is as important as—and perhaps more important than—the actual output. As telegraphed in the title of the paper, the methods of estimating overdiagnosis are major drivers of the models. A well-worn trope by statistician George E. P. Box is that “essentially, all statistical models are wrong, but some are useful” (6). That raises two key questions for any model: 1) How wrong is it? and 2) How useful is it? It is worthwhile examining every model through the lens of the two questions implied by Box’s maxim.
[1]
H. D. de Koning,et al.
Benefits and harms of mammography screening after age 74 years: model estimates of overdiagnosis.
,
2015,
Journal of the National Cancer Institute.
[2]
P. Prorok,et al.
Lead time and overdiagnosis.
,
2014,
Journal of the National Cancer Institute.
[3]
P. Gøtzsche,et al.
Lead-Time Models Should Not Be Used to Estimate Overdiagnosis in Cancer Screening
,
2014,
Journal of general internal medicine.
[4]
P. Gøtzsche,et al.
Overestimated lead times in cancer screening has led to substantial underestimation of overdiagnosis
,
2013,
British Journal of Cancer.
[5]
Deepak Kumar Subedi,et al.
Signal and Noise: Why So Many Predictions Fail – but Some Don't
,
2013
.
[6]
Nate Silver,et al.
The signal and the noise : why so many predictions fail but some don't
,
2012
.
[7]
Charles Seife,et al.
Proofiness: The Dark Arts of Mathematical Deception
,
2010
.
[8]
Freeman Dyson,et al.
A meeting with Enrico Fermi
,
2004,
Nature.
[9]
Derek J. Pike,et al.
Empirical Model‐building and Response Surfaces.
,
1988
.
[10]
Jørgensen Gøtzsche PC,et al.
Screening for breast cancer with mammography ( Review )
,
2020
.