Efficient statistical tests to compare Youden index: accounting for contingency correlation

Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance.

[1]  Haochuan Zhou,et al.  Confidence intervals for the difference in paired Youden indices , 2013, Pharmaceutical statistics.

[2]  C E Metz,et al.  Evaluation of receiver operating characteristic curve data in terms of information theory, with applications in radiography. , 1973, Radiology.

[3]  Xiao-Hua Zhou,et al.  Comparing the Accuracy of Two Diagnostic Tests , 2008 .

[4]  J. Deeks,et al.  A methodological review of how heterogeneity has been examined in systematic reviews of diagnostic test accuracy. , 2005, Health technology assessment.

[5]  Scott S Emerson,et al.  Nonparametric and Semiparametric Group Sequential Methods for Comparing Accuracy of Diagnostic Tests , 2008, Biometrics.

[6]  Ming Tan,et al.  ROC‐Based Utility Function Maximization for Feature Selection and Classification with Applications to High‐Dimensional Protease Data , 2008, Biometrics.

[7]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[8]  F. Tokat,et al.  Which urine marker test provides more diagnostic value in conjunction with standard cytology- ImmunoCyt/uCyt+ or Cytokeratin 20 expression , 2009, Diagnostic pathology.

[9]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[10]  C M Rutter,et al.  A hierarchical regression approach to meta‐analysis of diagnostic test accuracy evaluations , 2001, Statistics in medicine.

[11]  P Glasziou,et al.  Meta-analytic methods for diagnostic test accuracy. , 1995, Journal of clinical epidemiology.

[12]  A. Orlacchio,et al.  MicroRNA Implications across Neurodevelopment and Neuropathology , 2009, Journal of biomedicine & biotechnology.

[13]  Gary K Grunwald,et al.  Estimates of sensitivity and specificity can be biased when reporting the results of the second test in a screening trial conducted in series , 2010, BMC medical research methodology.

[14]  J. Yerushalmy Statistical problems in assessing methods of medical diagnosis, with special reference to X-ray techniques. , 1947, Public health reports.

[15]  M. Marberger,et al.  uCyt+ test: alternative to cystoscopy for less-invasive follow-up of patients with low risk of urothelial carcinoma. , 2006, Urology.

[16]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[17]  D. McClish Evaluation of the accuracy of medical tests in a region around the optimal point. , 2012, Academic radiology.

[18]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[19]  B J Biggerstaff,et al.  Comparing diagnostic tests: a simple graphic using likelihood ratios. , 2000, Statistics in medicine.

[20]  P. Bossuyt,et al.  The diagnostic odds ratio: a single indicator of test performance. , 2003, Journal of clinical epidemiology.

[21]  Petra Macaskill,et al.  Empirical Bayes estimates generated in a hierarchical summary ROC analysis agreed closely with those of a full Bayesian analysis. , 2004, Journal of clinical epidemiology.

[22]  S D Walter,et al.  Meta-analysis of screening data: a survey of the literature. , 1999, Statistics in medicine.

[23]  Feng Jiang,et al.  Regularized F-Measure Maximization for Feature Selection and Classification , 2009, Journal of biomedicine & biotechnology.

[24]  Ton J Cleophas,et al.  Validating diagnostic tests, correct and incorrect methods, new developments. , 2008, Current clinical pharmacology.

[25]  R. F. Wagner,et al.  Continuous versus categorical data for ROC analysis: some quantitative considerations. , 2001, Academic radiology.

[26]  Kelly H. Zou,et al.  Optimal thresholds by maximizing or minimizing various metrics via ROC-type analysis. , 2013, Academic radiology.

[27]  H. Trivedi,et al.  Ahmedabad tolerance induction protocol and chronic renal allograft dysfunction: pathologic observations and clinical implications , 2009, Diagnostic pathology.

[28]  Roger M Harbord,et al.  A unification of models for meta-analysis of diagnostic accuracy studies. , 2007, Biostatistics.

[29]  B. Reiser,et al.  Estimation of the Youden Index and its Associated Cutoff Point , 2005, Biometrical journal. Biometrische Zeitschrift.

[30]  Leevan Ling,et al.  Confidence intervals for a difference between proportions based on paired data , 2010, Statistics in medicine.

[31]  L E Moses,et al.  Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. , 1993, Statistics in medicine.

[32]  Constantine A Gatsonis,et al.  Hierarchical models for ROC curve summary measures: Design and analysis of multi‐reader, multi‐modality studies of medical tests , 2008, Statistics in medicine.

[33]  Huazhen Lin,et al.  Semi‐parametric maximum likelihood estimates for ROC curves of continuous‐scale tests , 2008, Statistics in medicine.

[34]  T J Vecchio,et al.  Predictive value of a single diagnostic test in unselected populations. , 1966, The New England journal of medicine.

[35]  G. Oehlert A note on the delta method , 1992 .

[36]  J Shepherd,et al.  Clinical effectiveness and cost-effectiveness of drotrecogin alfa (activated) (Xigris) for the treatment of severe sepsis in adults: a systematic review and economic evaluation. , 2005, Health technology assessment.

[37]  Johannes B Reitsma,et al.  Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. , 2005, Journal of clinical epidemiology.