Net reclassification indices for evaluating risk prediction instruments: a critical review.

Net reclassification indices have recently become popular statistics for measuring the prediction increment of new biomarkers. We review the various types of net reclassification indices and their correct interpretations. We evaluate the advantages and disadvantages of quantifying the prediction increment with these indices. For predefined risk categories, we relate net reclassification indices to existing measures of the prediction increment. We also consider statistical methodology for constructing confidence intervals for net reclassification indices and evaluate the merits of hypothesis testing based on such indices. We recommend that investigators using net reclassification indices should report them separately for events (cases) and nonevents (controls). When there are two risk categories, the components of net reclassification indices are the same as the changes in the true- and false-positive rates. We advocate the use of true- and false-positive rates and suggest it is more useful for investigators to retain the existing, descriptive terms. When there are three or more risk categories, we recommend against net reclassification indices because they do not adequately account for clinically important differences in shifts among risk categories. The category-free net reclassification index is a new descriptive device designed to avoid predefined risk categories. However, it experiences many of the same problems as other measures such as the area under the receiver operating characteristic curve. In addition, the category-free index can mislead investigators by overstating the incremental value of a biomarker, even in independent validation data. When investigators want to test a null hypothesis of no prediction increment, the well-established tests for coefficients in the regression model are superior to the net reclassification index. If investigators want to use net reclassification indices, confidence intervals should be calculated using bootstrap methods rather than published variance formulas. The preferred single-number summary of the prediction increment is the improvement in net benefit.

[1]  Michele Emdin,et al.  Fibrosis and mortality in patients with dilated cardiomyopathy. , 2013, JAMA.

[2]  Ewout W Steyerberg,et al.  Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers , 2011, Statistics in medicine.

[3]  M. Pencina,et al.  Evaluation of Markers and Risk Prediction Models , 2013, Medical decision making : an international journal of the Society for Medical Decision Making.

[4]  Thomas A. Gerds,et al.  The Net Reclassification Index (NRI): a Misleading Measure of Prediction Improvement with Miscalibrated or Overfit Models , 2013 .

[5]  Michael J Pencina,et al.  Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models , 2012, Statistics in medicine.

[6]  Zheyu Wang,et al.  Asymptotic and Finite Sample Behavior of Net Reclassification Indices , 2013 .

[7]  J. Ware,et al.  Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by M. J. Pencina et al., Statistics in Medicine (DOI: 10.1002/sim.2929) , 2008, Statistics in medicine.

[8]  C S Peirce,et al.  The numerical measure of the success of predictions. , 1884, Science.

[9]  M. Pencina,et al.  Interpreting incremental value of markers added to risk prediction models. , 2012, American journal of epidemiology.

[10]  Nancy R Cook,et al.  Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by M. J. Pencina et al., Statistics in Medicine (DOI: 10.1002/sim.2929) , 2008, Statistics in medicine.

[11]  D. Levy,et al.  Prediction of coronary heart disease using risk factor categories. , 1998, Circulation.

[12]  Sander Greenland,et al.  The need for reorientation toward cost‐effective prediction: Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by M. J. Pencina et al., Statistics in Medicine (DOI: 10.1002/sim.2929) , 2008, Statistics in medicine.

[13]  Xiao-Hua Zhou,et al.  The need for reorientation toward cost‐effective prediction: Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by Pencina et al., Statistics in Medicine (DOI: 10.1002/sim.2929) , 2008, Statistics in medicine.

[14]  Lu Tian,et al.  A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data , 2013, Statistics in medicine.

[15]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[16]  M. Gail,et al.  Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. , 1989, Journal of the National Cancer Institute.

[17]  P. Greenland,et al.  When is a new prediction marker useful? A consideration of lipoprotein-associated phospholipase A2 and C-reactive protein for stroke risk. , 2005, Archives of internal medicine.

[18]  Thomas A Gerds,et al.  A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index , 2014, Statistics in medicine.

[19]  Aasthaa Bansal,et al.  Further insight into the incremental value of new markers: the interpretation of performance measures and the importance of clinical context. , 2012, American journal of epidemiology.

[20]  Ralph B D'Agostino,et al.  Misuse of DeLong test to compare AUCs for nested models , 2012, Statistics in medicine.

[21]  Margaret S Pepe,et al.  Problems with risk reclassification methods for evaluating prediction models. , 2011, American journal of epidemiology.

[22]  P. Greenland,et al.  Coronary artery calcium score and risk classification for coronary heart disease prediction. , 2010, JAMA.

[23]  Kathleen F. Kerr,et al.  Testing for improvement in prediction model performance , 2013, Statistics in medicine.

[24]  John W Pickering,et al.  New metrics for assessing diagnostic potential of candidate biomarkers. , 2012, Clinical journal of the American Society of Nephrology : CJASN.

[25]  N. Cook Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction , 2007, Circulation.