Quantifying the accuracy of a diagnostic test or marker.

BACKGROUND In recent years, increasing focus has been directed to the methodology for evaluating (new) tests or biomarkers. A key step in the evaluation of a diagnostic test is the investigation into its accuracy. CONTENT We reviewed the literature on how to assess the accuracy of diagnostic tests. Accuracy refers to the amount of agreement between the results of the test under evaluation (index test) and the results of a reference standard or test. The generally recommended approach is to use a prospective cohort design in patients who are suspected of having the disease of interest, in which each individual undergoes the index and same reference standard tests. This approach presents several challenges, including the problems that can arise with the verification of the index test results by the preferred reference standard test, the choice of cutoff value in case of a continuous index test result, and the determination of how to translate accuracy results to recommendations for clinical use. This first in a series of 4 reports presents an overview of the designs of single-test accuracy studies and the concepts of specificity, sensitivity, posterior probabilities (i.e., predictive values) for the presence of target disease, ROC curves, and likelihood ratios, all illustrated with empirical data from a study on the diagnosis of suspected deep venous thrombosis. Limitations of the concept of the diagnostic accuracy for a single test are also highlighted. CONCLUSIONS The prospective cohort design in patients suspected of having the disease of interest is the optimal approach to estimate the accuracy of a diagnostic test. However, the accuracy of a diagnostic index test is not constant but varies across different clinical contexts, disease spectrums, and even patient subgroups.

[1]  Lucas M Bachmann,et al.  Sample sizes of studies on diagnostic accuracy: literature survey , 2006, BMJ : British Medical Journal.

[2]  Johannes B Reitsma,et al.  Case-control and two-gate designs in diagnostic accuracy studies. , 2005, Clinical chemistry.

[3]  C A Gatsonis,et al.  Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. , 2003, Clinical radiology.

[4]  Douglas G. Altman,et al.  Practical statistics for medical research , 1990 .

[5]  F. Harrell,et al.  Factors affecting sensitivity and specificity of exercise electrocardiography. Multivariable analysis. , 1984, The American journal of medicine.

[6]  N. Obuchowski,et al.  ROC curves in clinical chemistry: uses, misuses, and possible solutions. , 2004, Clinical chemistry.

[7]  P. Bossuyt,et al.  Empirical evidence of design-related bias in studies of diagnostic tests. , 1999, JAMA.

[8]  K. Linnet,et al.  A review on the methodology for assessing diagnostic tests. , 1988, Clinical chemistry.

[9]  T J Vecchio,et al.  Predictive value of a single diagnostic test in unselected populations. , 1966, The New England journal of medicine.

[10]  T. Fagan Letter: Nomogram for Bayes theorem. , 1975, The New England journal of medicine.

[11]  A. Zwinderman,et al.  Correcting for partial verification bias: a comparison of methods. , 2011, Annals of epidemiology.

[12]  James C. Boyd,et al.  Selection and Analytical Evaluations of Methods-With Statistical Techniques , 2006 .

[13]  J D Habbema,et al.  Redundancy of single diagnostic test evaluation. , 1999, Epidemiology.

[14]  Johannes B. Reitsma,et al.  A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard. , 2009, Journal of clinical epidemiology.

[15]  A. Hoes,et al.  Excluding deep vein thrombosis safely in primary care. , 2006, The Journal of family practice.

[16]  David Moher,et al.  Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Standards for Reporting of Diagnostic Accuracy. , 2003, Clinical chemistry.

[17]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[18]  J. Sengupta The Nonparametric Approach , 1989 .

[19]  Diederick E. Grobbee,et al.  Limitations of Sensitivity, Specificity, Likelihood Ratio, and Bayes' Theorem in Assessing Diagnostic Probabilities: A Clinical Example , 1997, Epidemiology.

[20]  Patrick M M Bossuyt,et al.  Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis. , 2009, Journal of clinical epidemiology.

[21]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[22]  M. Leeflang,et al.  Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions. , 2008, Clinical chemistry.

[23]  A. Feinstein,et al.  Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. , 1978, The New England journal of medicine.

[24]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[25]  Edward R. Ashwood,et al.  Tietz Textbook of Clinical Chemistry and Molecular Diagnostics , 2005 .

[26]  Yvonne Vergouwe,et al.  Bmc Medical Research Methodology Open Access Advantages of the Nested Case-control Design in Diagnostic Research , 2022 .

[27]  J D Habbema,et al.  Application of Treatment Thresholds to Diagnostic-test Evaluation , 1997, Medical decision making : an international journal of the Society for Medical Decision Making.

[28]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[29]  Llnnet,et al.  Assessing Diagnostic Tests Once an Optimal Cutoff Point Has Been Selected , 2022 .

[30]  Karel G M Moons,et al.  Ruling out deep venous thrombosis in primary care , 2005, Thrombosis and Haemostasis.

[31]  Diederick E Grobbee,et al.  When should we remain blind and when should our eyes remain open in diagnostic studies? , 2002, Journal of clinical epidemiology.

[32]  L. Brown,et al.  Interval Estimation for a Binomial Proportion , 2001 .

[33]  Link,et al.  Bias in sensitivity and specificity caused by data driven selection of optimal cut-off values: mechanisms, magnitude and solutions , 2008 .

[34]  A. Albert,et al.  On the use and computation of likelihood ratios in clinical chemistry. , 1982, Clinical chemistry.

[35]  T. Rohan,et al.  Agreement of self-reported use of menopausal hormone replacement therapy with physician reports. , 1999, Epidemiology.

[36]  A. Hoes,et al.  Diagnostic accuracy and user-friendliness of 5 point-of-care D-dimer tests for the exclusion of deep vein thrombosis. , 2010, Clinical chemistry.

[37]  References , 1971 .

[38]  C E Metz,et al.  Evaluation of receiver operating characteristic curve data in terms of information theory, with applications in radiography. , 1973, Radiology.