Hierarchical models for ROC curve summary measures: Design and analysis of multi‐reader, multi‐modality studies of medical tests

Comparative studies of the accuracy of diagnostic tests often involve designs according to which each study participant is examined by two or more of the tests and the diagnostic examinations are interpreted by several readers. Tests are then compared on the basis of a summary index, such as the (full or partial) area under the receiver operating characteristic (ROC) curve, averaged over the population of readers. The design and analysis of such studies naturally need to take into account the correlated nature of the diagnostic test results and interpretations. In this paper, we describe the use of hierarchical modelling for ROC summary measures derived from multi-reader, multi-modality studies. The models allow the variance of the estimates to depend on the actual value of the index and account for the correlation in the data both explicitly via parameters and implicitly via the hierarchical structure. After showing how the hierarchical models can be employed in the analysis of data from multi-reader, multi-modality studies, we discuss the design of such studies using the simulation-based, Bayesian design approach of Wang and Gelfand (Stat. Sci. 2002; 17(2):193-208). The methodology is illustrated via the analysis of data from a study conducted to evaluate a computer-aided diagnosis tool for screen film mammography and via the development of design considerations for a multi-reader study comparing display modes for digital mammography. The hierarchical model methodology described in this paper is also applicable to the meta-analysis of ROC studies.

[1]  N A Obuchowski,et al.  Multireader receiver operating characteristic studies: a comparison of study designs. , 1995, Academic radiology.

[2]  Fei Wang,et al.  A simulation-based approach to Bayesian sample size determination for performance under a given model and for separating models , 2002 .

[3]  Margaret S. Pepe,et al.  Receiver Operating Characteristic Methodology , 2000 .

[4]  N A Obuchowski,et al.  Computing Sample Size for Receiver Operating Characteristic Studies , 1994, Investigative radiology.

[5]  H E Rockette,et al.  Empiric assessment of parameters that affect the design of multireader receiver operating characteristic studies. , 1999, Academic radiology.

[6]  H. Ishwaran,et al.  A general class of hierarchical ordinal regression models with applications to correlated roc analysis , 2000 .

[7]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[8]  C. J. Adcock,et al.  The choice of sample size. Commentaries. Author's reply , 1997 .

[9]  W Zucchini,et al.  On the statistical analysis of ROC curves. , 1989, Statistics in medicine.

[10]  A. Toledano,et al.  Ordinal regression methodology for ROC curves derived from correlated data. , 1996, Statistics in medicine.

[11]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[12]  R. Gonzalez,et al.  Acute stroke: improved nonenhanced CT detection--benefits of soft-copy interpretation by using variable window width and center level settings. , 1999, Radiology.

[13]  K. Zou,et al.  Sample size considerations in observational health care quality studies , 2002, Statistics in Medicine.

[14]  C. Adcock Sample size determination : a review , 1997 .

[15]  S. Hillis A comparison of denominator degrees of freedom methods for multiple observer ROC analysis , 2007, Statistics in medicine.

[16]  Harvey Goldstein,et al.  Likelihood methods for fitting multilevel models with complex level-1 variation , 2002 .

[17]  K S Berbaum,et al.  Multireader, multicase receiver operating characteristic methodology: a bootstrap analysis. , 1995, Academic radiology.

[18]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[19]  K R Abrams,et al.  Bayesian Approaches to Meta-analysi of ROC Curves , 1999, Medical decision making : an international journal of the Society for Medical Decision Making.

[20]  Nancy A Obuchowski,et al.  A comparison of the Dorfman–Berbaum–Metz and Obuchowski–Rockette methods for receiver operating characteristic (ROC) data , 2005, Statistics in medicine.

[21]  K S Berbaum,et al.  Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. , 1998, Academic radiology.

[22]  S L Normand,et al.  On determination of sample size in hierarchical binomial models , 2001, Statistics in medicine.

[23]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[24]  C B Begg,et al.  A General Regression Methodology for ROC Curve Estimation , 1988, Medical decision making : an international journal of the Society for Medical Decision Making.

[25]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[26]  X H Zhou,et al.  Empirical Bayes Combination of Estimated Areas under ROC Curves Using Estimating Equations , 1996, Medical decision making : an international journal of the Society for Medical Decision Making.

[27]  R. F. Wagner,et al.  Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. , 2004, Academic radiology.

[28]  A. Toledano Three methods for analysing correlated ROC curves: a comparison in real data sets from multi‐reader, multi‐case studies with a factorial design , 2003, Statistics in medicine.

[29]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[30]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[31]  A. Gelfand,et al.  Efficient parametrisations for normal linear mixed models , 1995 .

[32]  L. Joseph,et al.  Bayesian sample size determination for normal means and differences between normal means , 1997 .

[33]  N A Obuchowski,et al.  Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. , 1997, Statistics in medicine.

[34]  R. F. Wagner,et al.  Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis. , 2000, Academic radiology.