A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index

The 'integrated discrimination improvement' (IDI) and the 'net reclassification index' (NRI) are statistics proposed as measures of the incremental prognostic impact that a new biomarker will have when added to an existing prediction model for a binary outcome. By design, both measures were meant to be intuitively appropriate, and the IDI and NRI formulae do look intuitively plausible. Both have become increasingly popular. We shall argue, however, that their use is not always safe. If IDI and NRI are used to measure gain in prediction performance, then poorly calibrated models may appear advantageous, and in a simulation study, even the model that actually generates the data (and hence is the best possible model) can be improved on without adding measured information. We illustrate these shortcomings in actual cancer data as well as by Monte Carlo simulations. In these examples, we contrast IDI and NRI with the area under ROC and the Brier score. Unlike IDI and NRI, these traditional measures have the characteristic that prognostic performance cannot be accidentally or deliberately inflated.

[1]  Torben Skovsgaard,et al.  New insight into epirubicin cardiac toxicity: competing risks analysis of 1097 breast cancer patients. , 2008, Journal of the National Cancer Institute.

[2]  F. Harrell,et al.  Criteria for Evaluation of Novel Markers of Cardiovascular Risk: A Scientific Statement From the American Heart Association , 2009, Circulation.

[3]  Thomas Lumley,et al.  American Journal of Epidemiology Practice of Epidemiology Evaluating the Incremental Value of New Biomarkers with Integrated Discrimination Improvement , 2022 .

[4]  Nancy R Cook,et al.  Clinically relevant measures of fit? A note of caution. , 2012, American journal of epidemiology.

[5]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[6]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[7]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[8]  J. Habbema,et al.  The measurement of performance in probabilistic diagnosis. III. Methods based on continuous functions of the diagnostic probabilities. , 1978, Methods of information in medicine.

[9]  R. L. Winkler The Quantification of Judgment: Some Methodological Suggestions , 1967 .

[10]  Michael W Kattan,et al.  Evaluating a New Marker’s Predictive Contribution , 2004, Clinical Cancer Research.

[11]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[12]  Nancy R. Cook,et al.  Assessing the Incremental Role of Novel and Emerging Risk Factors , 2010, Current cardiovascular risk reports.

[13]  L. J. Savage Elicitation of Personal Probabilities and Expectations , 1971 .

[14]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[15]  M. Pepe,et al.  Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. , 2004, American journal of epidemiology.