Statistical Evaluation of a Biomarker

A biomarker may provide a diagnosis, assess disease severity or risk, or guide other clinical interventions such as the use of drugs. Although considerable progress has been made in standardizing the methodology and reporting of randomized trials, less has been accomplished concerning the assessment of biomarkers. Biomarker studies are often presented with poor biostatistics and methodologic flaws that precludes them from providing a reliable and reproducible scientific message. A host of issues are discussed that can improve the statistical evaluation and reporting of biomarker studies. Investigators should be aware of these issues when designing their studies, editors and reviewers when analyzing a manuscript, and readers when interpreting results.

[1]  A R Feinstein,et al.  Use of methodological standards in diagnostic test research. Getting better but still not good. , 1995, JAMA.

[2]  C B Begg,et al.  Biases in the assessment of diagnostic tests. , 1987, Statistics in medicine.

[3]  Jonathan J Deeks,et al.  Systematic reviews in health care: Systematic reviews of evaluations of diagnostic and screening tests. , 2001, BMJ.

[4]  L. Guillou,et al.  Determination of the threshold of cardiac troponin I associated with an adverse postoperative outcome after cardiac surgery: a comparative study between coronary artery bypass graft, valve surgery, and combined cardiac surgery , 2007, Critical care.

[5]  J R Beck,et al.  The use of relative operating characteristic (ROC) curves in test performance evaluation. , 1986, Archives of pathology & laboratory medicine.

[6]  N A Obuchowski,et al.  Sample size calculations in studies of test accuracy , 1998, Statistical methods in medical research.

[7]  Saurabh Kumar,et al.  Prognostic Value of Brain Natriuretic Peptide in Noncardiac Surgery: A Meta-analysis , 2009, Anesthesiology.

[8]  B. Riou,et al.  Serum procalcitonin measurement as diagnostic and prognostic marker in febrile adult patients presenting to the emergency department , 2007, Critical care.

[9]  Anders Larsson,et al.  Use of multiple biomarkers to improve the prediction of death from cardiovascular causes. , 2008, The New England journal of medicine.

[10]  A. E. El Nahas,et al.  The outcome of acute renal failure in the intensive care unit according to RIFLE: model application, sensitivity, and predictability. , 2005, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[11]  R. Hilgers Distribution-Free Confidence Bounds for ROC Curves , 1991, Methods of Information in Medicine.

[12]  Holly Janes,et al.  Practice of Epidemiology Adjusting for Covariates in Studies of Diagnostic, Screening, or Prognostic Markers: an Old Concept in a New Setting , 2022 .

[13]  P N Valenstein,et al.  Evaluating diagnostic tests with imperfect standards. , 1990, American journal of clinical pathology.

[14]  J Hilden,et al.  Regret graphs, diagnostic uncertainty and Youden's Index. , 1996, Statistics in medicine.

[15]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[16]  B. Riou,et al.  Assessment of the Accuracy of Procalcitonin to Diagnose Postoperative Infection after Cardiac Surgery , 2007, Anesthesiology.

[17]  Jen‐pei Liu,et al.  Tests of equivalence and non‐inferiority for diagnostic accuracy based on the paired areas under ROC curves , 2006, Statistics in medicine.

[18]  David Moher,et al.  The STARD Statement for Reporting Studies of Diagnostic Accuracy: Explanation and Elaboration , 2003, Annals of Internal Medicine [serial online].

[19]  J. Ware The limitations of risk factors as prognostic tools. , 2006, The New England journal of medicine.

[20]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[21]  A. Flahault,et al.  Sample size calculation should be performed for design accuracy in diagnostic test studies. , 2005, Journal of clinical epidemiology.

[22]  P. Bossuyt,et al.  Empirical evidence of design-related bias in studies of diagnostic tests. , 1999, JAMA.

[23]  B. Mégarbane,et al.  Prognostic factors in non-exertional heatstroke , 2010, Intensive Care Medicine.

[24]  Chen-Tuo Liao,et al.  A non‐inferiority test for diagnostic accuracy based on the paired partial areas under ROC curves , 2008, Statistics in medicine.

[25]  M. Christ-Crain,et al.  Effect of procalcitonin-guided treatment on antibiotic use and outcome in lower respiratory tract infections: cluster-randomised, single-blinded intervention trial , 2004, The Lancet.

[26]  Theo Stijnen,et al.  Multivariate random effects meta-analysis of diagnostic tests with multiple thresholds , 2009, BMC medical research methodology.

[27]  Daniel Talmor,et al.  Occult hypoperfusion and mortality in patients with suspected infection , 2007, Intensive Care Medicine.

[28]  Nancy R Cook,et al.  Advances in Measuring the Effect of Individual Predictors of Cardiovascular Risk: The Role of Reclassification Measures , 2009, Annals of Internal Medicine.

[29]  Nancy R. Cook,et al.  Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction , 2007, Circulation.

[30]  Michel Wolff,et al.  Is procalcitonin a marker of critical illness in heatstroke? , 2008, Intensive Care Medicine.

[31]  N. Perkins,et al.  Optimal Cut-point and Its Corresponding Youden Index to Discriminate Individuals Using Pooled Blood Samples , 2005, Epidemiology.

[32]  Nancy A Obuchowski,et al.  An ROC‐type measure of diagnostic accuracy when the gold standard is continuous‐scale , 2006, Statistics in medicine.

[33]  B. Riou,et al.  Prevalence of viral infection markers by polymerase chain reaction amplification and interferon‐alpha measurements among patients undergoing lumbar puncture in an emergency department , 2004, Journal of medical virology.

[34]  M. Bronskill,et al.  Receiver Operator characteristic (ROC) Analysis without Truth , 1990, Medical decision making : an international journal of the Society for Medical Decision Making.

[35]  P. Bollaert,et al.  Plasma Level of a Triggering Receptor Expressed on Myeloid Cells-1: Its Diagnostic Accuracy in Patients with Suspected Sepsis , 2004, Annals of Internal Medicine.

[36]  Bias in trials comparing paired continuous tests can cause researchers to choose the wrong screening modality , 2009, BMC medical research methodology.

[37]  A. Saah,et al.  Sensitivity and Specificity Reconsidered: The Meaning of These Terms in Analytical and Diagnostic Settings , 1997, Annals of Internal Medicine.

[38]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[39]  J Carpenter,et al.  Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. , 2000, Statistics in medicine.

[40]  Douglas G. Altman,et al.  Statistical Methods for Examining Heterogeneity and Combining Results from Several Studies in Meta‐Analysis , 2008 .

[41]  M. Olschewski,et al.  Importance of Cardiac Troponins I and T in Risk Stratification of Patients With Acute Pulmonary Embolism , 2002, Circulation.

[42]  Fagan Tj Letter: Nomogram for Bayes theorem. , 1975 .

[43]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[44]  E. Roupie,et al.  A single procalcitonin level does not predict adverse outcomes of women with pyelonephritis. , 2007, European urology.

[45]  W. Mower,et al.  Evaluating bias and variability in diagnostic test reports. , 1999, Annals of emergency medicine.

[46]  J. Pugin,et al.  Use of procalcitonin to shorten antibiotic treatment duration in septic patients: a randomized trial. , 2008, American journal of respiratory and critical care medicine.

[47]  P. Bossuyt,et al.  BMC Medical Research Methodology , 2002 .

[48]  R. Richards-Kortum,et al.  A comparison of C/B ratios from studies using receiver operating characteristic curve analysis. , 1999, Journal of clinical epidemiology.

[49]  R. Jaeschke,et al.  A readers' guide to the interpretation of diagnostic test properties: clinical example of sepsis , 2003, Intensive Care Medicine.

[50]  H Brenner,et al.  Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence. , 1997, Statistics in medicine.

[51]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[52]  Monya Baker,et al.  In biomarkers we trust? , 2005, Nature Biotechnology.

[53]  Mitchell H Katz,et al.  Multivariable Analysis: A Primer for Readers of Medical Research , 2003, Annals of Internal Medicine.

[54]  P. Glasziou,et al.  When Should a New Test Become the Current Reference Standard? , 2008, Annals of Internal Medicine.

[55]  Sander Greenland,et al.  The need for reorientation toward cost‐effective prediction: Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by M. J. Pencina et al., Statistics in Medicine (DOI: 10.1002/sim.2929) , 2008, Statistics in medicine.

[56]  M. Pepe,et al.  Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. , 2004, American journal of epidemiology.

[57]  E. Keeler,et al.  Primer on certain elements of medical decision making. , 1975, The New England journal of medicine.

[58]  P. Hausfater Le dosage de la procalcitonine en pratique clinique chez l'adulte , 2007 .

[59]  Mitchell M. Levy,et al.  2001 SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions Conference , 2003, Intensive Care Medicine.

[60]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA Statement , 2009, BMJ : British Medical Journal.

[61]  J. Ferrières,et al.  Genetic determinants of response to clopidogrel and cardiovascular events. , 2009, The New England journal of medicine.

[62]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[63]  B. Riou,et al.  Comparison of Brain Natriuretic Peptide and Probrain Natriuretic Peptide in the Diagnosis of Cardiogenic Pulmonary Edema in Patients Aged 65 and Older , 2005, Journal of the American Geriatrics Society.

[64]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[65]  M. Brock,et al.  The use of “overall accuracy” to evaluate the validity of screening or diagnostic tests , 2004, Journal of General Internal Medicine.

[66]  N. Perkins,et al.  The inconsistency of "optimal" cutpoints obtained using two criteria based on the receiver operating characteristic curve. , 2006, American journal of epidemiology.

[67]  L. Fleisher,et al.  From Creatine Kinase-MB to Troponin: The Adoption of a New Standard , 2010, Anesthesiology.

[68]  J A Hanley,et al.  A Comparison of Parametric and Nonparametric Approaches to ROC Analysis of Quantitative Diagnostic Tests , 1997, Medical decision making : an international journal of the Society for Medical Decision Making.

[69]  Alan S Maisel,et al.  Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. , 2002, The New England journal of medicine.

[70]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[71]  A. Perel,et al.  Early and Delayed Myocardial Infarction after Abdominal Aortic Surgery , 2005, Anesthesiology.

[72]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[73]  Roger M Harbord,et al.  A unification of models for meta-analysis of diagnostic accuracy studies. , 2007, Biostatistics.

[74]  Jean L Freeman,et al.  A non-parametric method for the comparison of partial areas under ROC curves and its application to large health care data sets. , 2002, Statistics in medicine.

[75]  M. Puyraveau,et al.  Usefulness of procalcitonin in the early detection of infection after thoracic surgery. , 2005, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[76]  L. Guillou,et al.  Simultaneous Measurement of Cardiac Troponin I, B-type Natriuretic Peptide, and C-reactive Protein for the Prediction of Long-term Cardiac Outcome after Cardiac Surgery , 2009, Anesthesiology.

[77]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[78]  D. Moher,et al.  The Revised CONSORT Statement for Reporting Randomized Trials: Explanation and Elaboration , 2001, Annals of Internal Medicine.

[79]  M. Reeves,et al.  Interval likelihood ratios: Another advantage for the evidence-based diagnostician , 2003 .

[80]  B. Riou Troponin: important in severe trauma and a first step in the biological marker revolution. , 2004, Anesthesiology.

[81]  R. D. de Winter,et al.  Value of myoglobin, troponin T, and CK-MBmass in ruling out an acute myocardial infarction in the emergency room. , 1995, Circulation.

[82]  P. Coriat,et al.  Influence of renal dysfunction on the accuracy of procalcitonin for the diagnosis of postoperative infection after vascular surgery , 2008, Critical care medicine.

[83]  N. Obuchowski,et al.  ROC curves in clinical chemistry: uses, misuses, and possible solutions. , 2004, Clinical chemistry.

[84]  Johannes B Reitsma,et al.  Evaluation of QUADAS, a tool for the quality assessment of diagnostic accuracy studies , 2006, BMC medical research methodology.

[85]  K. Kypri,et al.  Bayes' Theorem to estimate population prevalence from Alcohol Use Disorders Identification Test (AUDIT) scores , 2009, Addiction.

[86]  J. Marshall,et al.  Biomarkers of sepsis , 2009, Critical care medicine.

[87]  C M Rutter,et al.  A hierarchical regression approach to meta‐analysis of diagnostic test accuracy evaluations , 2001, Statistics in medicine.

[88]  N A Obuchowski,et al.  Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. , 1997, Statistics in medicine.

[89]  C. Beigelman,et al.  Usefulness of B-type natriuretic peptide in elderly patients with acute dyspnea , 2004, Intensive Care Medicine.

[90]  J. Antognini,et al.  Perioperative Drug Therapy in Elderly Patients , 2009, Anesthesiology.

[91]  L. Guillou,et al.  Kinetic analysis of cardiac troponin I release is no more accurate than a single 24‐h measurement in predicting in‐hospital outcome after cardiac surgery , 2008, European journal of anaesthesiology.

[92]  C. Beigelman,et al.  Acute respiratory failure in the elderly: etiology, emergency diagnosis and prognosis , 2006, Critical care.

[93]  Lucas M Bachmann,et al.  Sample sizes of studies on diagnostic accuracy: literature survey , 2006, BMJ : British Medical Journal.

[94]  M. Levy,et al.  2001 SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions Conference , 2003, Intensive care medicine.

[95]  Ben Ewald,et al.  Post hoc choice of cut points introduced bias to diagnostic research. , 2006, Journal of clinical epidemiology.