Projecting the benefits and harms of mammography using statistical models: proof or proofiness?

Statistical models are often used in medicine and public health when there are important gaps in a body of empirical evidence regarding the impact of interventions on health outcomes. The models generally incorporate multiple parameters and variables with uncertain values. For example, lacking firm evidence that stage at diagnosis is a valid surrogate for a health outcome such as mortality, statistical models produce projections based on a chain of assumptions. Health outcomes are often projected beyond available evidence from clinical trials, perhaps years or decades into the future—a classic “out of sample” problem (3). Such modeling requires assumptions, many of which are unobserved or even unobservable, such as progression rates of preclinical biological processes. In this issue of the Journal, a team of very experienced modelers tackles an important question: What are the benefits and harms of mammography screening after the age of 74 years? (4) They conclude that the balance of benefits and harms of routine screening mammography is likely to remain positive until about age 90 years. To reach this conclusion, the authors employ three complex statistical microsimulation models, a necessity given that the well of reliable empirical evidence from randomized trials runs dry beyond the age of 74 years (5). The average reader will lack the time, patience, or skill to dissect the three models or their underlying assumptions, and so many will have to take on faith the model outputs emphasized in the abstract, despite the recognition by most modelers that identifying and studying the uncertainties in the assumptions that drive the output is as important as—and perhaps more important than—the actual output. As telegraphed in the title of the paper, the methods of estimating overdiagnosis are major drivers of the models. A well-worn trope by statistician George E. P. Box is that “essentially, all statistical models are wrong, but some are useful” (6). That raises two key questions for any model: 1) How wrong is it? and 2) How useful is it? It is worthwhile examining every model through the lens of the two questions implied by Box’s maxim.