Development and evaluation of multi-marker risk scores for clinical prognosis

Heart failure research suggests that multiple biomarkers could be combined with relevant clinical information to more accurately quantify individual risk and guide patient-specific treatment strategies. Therefore, statistical methodology is required to determine multi-marker risk scores that yield improved prognostic performance. Development of a prognostic score that combines biomarkers with clinical variables requires specification of an appropriate statistical model and is most frequently achieved using standard regression methods such as Cox regression. We demonstrate that care is needed in model specification and that maximal use of marker information requires consideration of potential non-linear effects and interactions. The derived multi-marker score can be evaluated using time-dependent receiver operating characteristic methods, or risk reclassification methods adapted for survival outcomes. We compare the performance of alternative model accuracy methods using simulations, both to evaluate power and to quantify the potential loss in accuracy associated with use of a sub-optimal regression model to develop the multi-marker score. We illustrate development and evaluation strategies using data from the Penn Heart Failure Study. Based on our results, we recommend that analysts carefully examine the functional form for component markers and consider plausible forms for effect modification to maximize the prognostic potential of a model-derived multi-marker score.

[1]  M. Pepe,et al.  Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. , 2004, American journal of epidemiology.

[2]  T. Cai,et al.  Robust combination of multiple diagnostic tests for classifying censored event times. , 2007, Biostatistics.

[3]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[4]  P. Grambsch,et al.  Martingale-based residuals for survival models , 1990 .

[5]  Stefan Neubauer,et al.  The failing heart--an engine out of fuel. , 2007, The New England journal of medicine.

[6]  P. Heagerty,et al.  Time‐Dependent Predictive Accuracy in the Presence of Competing Risks , 2010, Biometrics.

[7]  J. Manson,et al.  Biomarkers of cardiovascular disease risk in women. , 2015, Metabolism: clinical and experimental.

[8]  J. Copas,et al.  Overestimation of the receiver operating characteristic curve for logistic regression , 2002 .

[9]  Chin-Tsang Chiang,et al.  Optimal Composite Markers for Time‐Dependent Receiver Operating Characteristic Curves with Censored Survival Data , 2010 .

[10]  Holly Janes,et al.  Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve. , 2009, Biometrika.

[11]  Ewout W Steyerberg,et al.  Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers , 2011, Statistics in medicine.

[12]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[13]  Torben Martinussen,et al.  Dynamic Regression Models for Survival Data , 2006 .

[14]  Vivian Viallon,et al.  How to evaluate the calibration of a disease risk prediction tool , 2007, Statistics in medicine.

[15]  Nancy R Cook,et al.  Advances in Measuring the Effect of Individual Predictors of Cardiovascular Risk: The Role of Reclassification Measures , 2009, Annals of Internal Medicine.

[16]  D. Lackland Epidemiologic Methods: Studying the Occurrence of Disease, T.D. Koepsell, N.S. Weiss. Oxford University Press (2003), 416 pages, $59.95 , 2005 .

[17]  R. Vasan,et al.  Biomarkers of Cardiovascular Disease: Molecular Basis and Practical Considerations , 2006, Circulation.

[18]  Richard M. Simon,et al.  Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data , 2011, Briefings Bioinform..

[19]  T. Lumley,et al.  Time‐Dependent ROC Curves for Censored Survival Data and a Diagnostic Marker , 2000, Biometrics.

[20]  Margaret S Pepe,et al.  Problems with risk reclassification methods for evaluating prediction models. , 2011, American journal of epidemiology.

[21]  D. Mozaffarian,et al.  The Seattle Heart Failure Model: Prediction of Survival in Heart Failure , 2006, Circulation.

[22]  A. Wu,et al.  High-Sensitivity ST2 for Prediction of Adverse Outcomes in Chronic Heart Failure , 2011, Circulation. Heart failure.

[23]  C. Begg,et al.  One statistical test is sufficient for assessing new predictive markers , 2011, BMC Medical Research Methodology.

[24]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[25]  P. Heagerty,et al.  Survival Model Predictive Accuracy and ROC Curves , 2005, Biometrics.

[26]  P. Grambsch,et al.  Proportional hazards tests and diagnostics based on weighted residuals , 1994 .

[27]  Ian W. McKeague,et al.  A partly parametric additive risk model , 1994 .

[28]  E. Braunwald,et al.  Biomarkers in heart failure. , 2008, The New England journal of medicine.

[29]  Holly Janes,et al.  Assessing the Value of Risk Predictions by Using Risk Stratification Tables , 2008, Annals of Internal Medicine.

[30]  Tianxi Cai,et al.  Application of the Time‐Dependent ROC Curves for Prognostic Accuracy with Multiple Biomarkers , 2006, Biometrics.

[31]  Nancy R Cook,et al.  Performance of reclassification statistics in comparing risk prediction models , 2011, Biometrical journal. Biometrische Zeitschrift.

[32]  D. Mozaffarian,et al.  Heart disease and stroke statistics--2011 update: a report from the American Heart Association. , 2011, Circulation.