Relationship between Roe and Metz simulation model for multireader diagnostic data and Obuchowski‐Rockette model parameters

For the typical diagnostic radiology study design, each case (ie, patient) undergoes several diagnostic tests (or modalities) and the resulting images are interpreted by several readers. Often, each reader is asked to assign a confidence‐of‐disease rating to each case for each test, and the diagnostic tests are compared with respect to reader‐performance outcomes that are functions of the reader receiver operating characteristic (ROC) curves, such as the area under the ROC curve. These reader‐performance outcomes are frequently analyzed using the Obuchowski and Rockette method, which allows conclusions to generalize to both the reader and case populations. The simulation model proposed by Roe and Metz (RM) in 1997 emulates confidence‐of‐disease data collected from such studies and has been an important tool for empirically evaluating various reader‐performance analysis methods. However, because the RM model parameters are expressed in terms of a continuous decision variable rather than in terms of reader‐performance outcomes, it has not been possible to evaluate the realism of the RM model. I derive the relationships between the RM and Obuchowski‐Rockette model parameters for the empirical area under the ROC curve reader‐performance outcome. These relationships make it possible to evaluate the realism of the RM parameter models and to assess the performance of Obuchowski‐Rockette parameter estimates. An example illustrates the application of the relationships for assessing the performance of a proposed upper one‐sided confidence bound for the Obuchowski‐Rockette test‐by‐reader variance component, which is useful for sample size estimation.

[1]  M. H. Quenouille Approximate Tests of Correlation in Time‐Series , 1949 .

[2]  Pranab Kumar Sen,et al.  On Some Convergence Properties of UStatistics , 1960 .

[3]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[4]  J. N. Arvesen Jackknifing U-statistics , 1968 .

[5]  D. Dorfman,et al.  Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—Rating-method data , 1969 .

[6]  Dennis P. Tihansky,et al.  Properties of the Bivariate Normal Cumulative Distribution , 1972 .

[7]  C. Metz,et al.  Visual detection and localization of radiographic images. , 1975, Radiology.

[8]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[9]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[10]  C. Metz,et al.  A New Approach for Testing the Significance of Differences Between ROC Curves Measured from Correlated Data , 1984 .

[11]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[12]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[13]  D. Chakraborty,et al.  Free-response methodology: alternate analysis and a new observer-performance experiment. , 1990, Radiology.

[14]  M. Chavance [Jackknife and bootstrap]. , 1992, Revue d'epidemiologie et de sante publique.

[15]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[16]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[17]  N A Obuchowski,et al.  Multireader receiver operating characteristic studies: a comparison of study designs. , 1995, Academic radiology.

[18]  R. Swensson Unified measurement of observer performance in detecting and localizing target objects on images. , 1996, Medical physics.

[19]  J A Hanley,et al.  The use of the 'binormal' model for parametric ROC analysis of quantitative diagnostic tests. , 1996, Statistics in medicine.

[20]  C A Roe,et al.  Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. , 1997, Academic radiology.

[21]  C E Metz,et al.  The "proper" binormal model: parametric receiver operating characteristic curve estimation with degenerate data. , 1997, Academic radiology.

[22]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[23]  K S Berbaum,et al.  Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. , 1998, Academic radiology.

[24]  H E Rockette,et al.  Empiric assessment of parameters that affect the design of multireader receiver operating characteristic studies. , 1999, Academic radiology.

[25]  C. Metz,et al.  "Proper" Binormal ROC Curves: Theory and Maximum-Likelihood Estimation. , 1999, Journal of mathematical psychology.

[26]  N A Obuchowski,et al.  Data analysis for detection and localization of multiple abnormalities with application to mammography. , 2000, Academic radiology.

[27]  R. F. Wagner,et al.  Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis. , 2000, Academic radiology.

[28]  R. F. Wagner,et al.  Continuous versus categorical data for ROC analysis: some quantitative considerations. , 2001, Academic radiology.

[29]  R F Wagner,et al.  Analysis of uncertainties in estimates of components of variance in multivariate ROC analysis. , 2001, Academic radiology.

[30]  Dev P Chakraborty,et al.  Observer studies involving detection and localization: modeling, analysis, and validation. , 2004, Medical physics.

[31]  Xiao Song,et al.  A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data. , 2005, Biostatistics.

[32]  Stephen L Hillis,et al.  Monte Carlo validation of the Dorfman-Berbaum-Metz method using normalized pseudovalues and less data-based model simplification. , 2005, Academic radiology.

[33]  Andriy I. Bandos,et al.  Resampling Methods for the Area Under the ROC Curve , 2006 .

[34]  Murray H. Loew,et al.  Assessing Classifiers from Two Independent Data Sets Using ROC Analysis: A Nonparametric Approach , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Brandon D Gallas,et al.  One-shot estimate of MRMC variance: AUC. , 2006, Academic radiology.

[36]  Kyle J Myers,et al.  Multireader multicase variance analysis for binary data. , 2007, Journal of the Optical Society of America. A, Optics, image science, and vision.

[37]  Andriy I. Bandos,et al.  Exact Bootstrap Variances of the Area Under ROC Curve , 2007 .

[38]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[39]  S. Hillis A comparison of denominator degrees of freedom methods for multiple observer ROC analysis , 2007, Statistics in medicine.

[40]  Lucretiu M Popescu,et al.  Nonparametric ROC and LROC analysis. , 2007, Medical physics.

[41]  Stephen L Hillis,et al.  Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis. , 2008, Academic radiology.

[42]  R. F. Wagner,et al.  A Framework for Random-Effects ROC Analysis: Biases with the Bootstrap and Other Variance Estimators , 2009 .

[43]  Nancy A Obuchowski,et al.  Reducing the number of reader interpretations in MRMC studies. , 2009, Academic radiology.

[44]  Dev P Chakraborty Prediction accuracy of a sample-size estimation method for ROC studies. , 2010, Academic radiology.

[45]  Nancy A. Obuchowski,et al.  Power estimation for multireader ROC methods an updated and unified approach. , 2011, Academic radiology.

[46]  Nancy A Obuchowski,et al.  Sample size tables for computer-aided detection studies. , 2011, AJR. American journal of roentgenology.

[47]  Stephen L Hillis,et al.  Using the mean-to-sigma ratio as a measure of the improperness of binormal ROC curves. , 2011, Academic radiology.

[48]  Weijie Chen,et al.  Classifier variability: Accounting for training and testing , 2012, Pattern Recognit..

[49]  Nancy A Obuchowski,et al.  Multi-reader ROC studies with split-plot designs: a comparison of statistical methods. , 2012, Academic radiology.

[50]  Stephen L Hillis,et al.  Simulation of unequal-variance binormal multireader ROC decision data: an extension of the Roe and Metz simulation model. , 2012, Academic radiology.

[51]  Brandon D Gallas,et al.  Statistical power considerations for a utility endpoint in observer performance studies. , 2013, Academic radiology.

[52]  Adam Wunderlich,et al.  Multireader multicase reader studies with binary agreement data: simulation, analysis, validation, and sizing , 2014, Journal of medical imaging.

[53]  Brandon D Gallas,et al.  Generalized Roe and Metz receiver operating characteristic model: analytic link between simulated decision scores and empirical AUC variances and covariances , 2014, Journal of medical imaging.

[54]  Stephen L Hillis,et al.  A marginal‐mean ANOVA approach for analyzing multireader multicase radiological imaging data , 2014, Statistics in medicine.

[55]  Stephen L Hillis,et al.  Demonstration of multi- and single-reader sample size program for diagnostic studies software , 2015, Medical Imaging.

[56]  Stephen L Hillis,et al.  Equivalence of binormal likelihood‐ratio and bi‐chi‐squared ROC curve models , 2016, Statistics in medicine.