The Net Reclassification Index (NRI): A Misleading Measure of Prediction Improvement Even with Independent Test Data Sets

The Net Reclassification Index (NRI) is a very popular measure for evaluating the improvement in prediction performance gained by adding a marker to a set of baseline predictors. However, the statistical properties of this novel measure have not been explored in depth. We demonstrate the alarming result that the NRI statistic calculated on a large test dataset using risk models derived from a training set is likely to be positive even when the new marker has no predictive information. A related theoretical example is provided in which an incorrect risk function that includes an uninformative marker is proven to erroneously yield a positive NRI. Some insight into this phenomenon is provided. Since large values for the NRI statistic may simply be due to use of poorly fitting risk models, we suggest caution in using the NRI as the basis for marker evaluation. Other measures of prediction performance improvement, such as measures derived from the receiver operating characteristic curve, the net benefit function, and the Brier score, cannot be large due to poorly fitting risk functions.

[1]  Ian M Thompson,et al.  Operating characteristics of prostate-specific antigen in men with an initial PSA level of 3.0 ng/ml or lower. , 2005, JAMA.

[2]  M. Pepe,et al.  Net Reclassification Index: a Misleading Measure of Prediction Improvement , 2013 .

[3]  E. Steyerberg Clinical Prediction Models , 2008, Statistics for Biology and Health.

[4]  Andrew J Vickers,et al.  Traditional statistical methods for evaluating prediction models are uninformative as to clinical value: towards a decision analytic framework. , 2010, Seminars in oncology.

[5]  Tianxi Cai,et al.  Risk Assessment and Evaluation of Predictions , 2013 .

[6]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[7]  Ewout W Steyerberg,et al.  Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers , 2011, Statistics in medicine.

[8]  Michael J Pencina,et al.  Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models , 2012, Statistics in medicine.

[9]  Nancy R Cook,et al.  Using relative utility curves to evaluate risk prediction , 2009, Journal of the Royal Statistical Society. Series A,.

[10]  Thomas A Gerds,et al.  A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index , 2014, Statistics in medicine.

[11]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[12]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[13]  Kathleen F. Kerr,et al.  Testing for improvement in prediction model performance , 2013, Statistics in medicine.

[14]  C. Begg,et al.  One statistical test is sufficient for assessing new predictive markers , 2011, BMC Medical Research Methodology.

[15]  Holly Janes,et al.  Methods for Evaluating Prediction Performance of Biomarkers and Tests , 2013 .

[16]  Margaret Sullivan Pepe,et al.  Combining Several Screening Tests: Optimality of the Risk Score , 2002, Biometrics.

[17]  J. Ioannidis,et al.  Assessment of claims of improved prediction beyond the Framingham risk score. , 2009, JAMA.

[18]  E. Elkin,et al.  Decision Curve Analysis: A Novel Method for Evaluating Prediction Models , 2006, Medical decision making : an international journal of the Society for Medical Decision Making.

[19]  Thomas Lumley,et al.  American Journal of Epidemiology Practice of Epidemiology Evaluating the Incremental Value of New Biomarkers with Integrated Discrimination Improvement , 2022 .

[20]  Andrew J. Vickers,et al.  Does the Net Reclassification Improvement Help Us Evaluate Models and Markers? , 2014, Annals of Internal Medicine.

[21]  Jørgen Hilden,et al.  Commentary: On NRI, IDI, and "good-looking" statistics with nothing underneath. , 2014, Epidemiology.

[22]  Kathleen F. Kerr,et al.  Net reclassification indices for evaluating risk prediction instruments: a critical review. , 2014, Epidemiology.

[23]  B. van Calster,et al.  Evaluating a New Marker for Risk Prediction Using the Test Tradeoff: An Update , 2012, The international journal of biostatistics.

[24]  M H Gail,et al.  Two Criteria for Evaluating Risk Prediction Models , 2011, Biometrics.

[25]  Jialiang Li,et al.  Multicategory reclassification statistics for assessing improvements in diagnostic accuracy. , 2013, Biostatistics.