Multireader sample size program for diagnostic studies: demonstration and methodology

Abstract. The software “Multireader sample size program for diagnostic studies,” written by Kevin Schartz and Stephen Hillis, performs sample size computations for diagnostic reader-performance studies. The program computes the sample size needed to detect a specified difference in a reader-performance measure between two imaging modalities when using the analysis methods initially proposed by Dorfman, Berbaum, and Metz, and Obuchowski and Rockette, and later unified and improved by Hillis and colleagues. A commonly used reader-performance measure is the area under the receiver-operating-characteristic curve. The program has an easy-to-use step-by-step intuitive interface that walks the user through the entry of the needed information. It can be used with several different study designs, inference procedures, hypotheses, and input and output formats. The program is functional in Windows, OS X, and Linux. The methodology underlying the software is discussed for the most common diagnostic study design, where each reader evaluates each case using each modality.

[1]  N A Obuchowski,et al.  Data analysis for detection and localization of multiple abnormalities with application to mammography. , 2000, Academic radiology.

[2]  D. Heisey,et al.  The Abuse of Power , 2001 .

[3]  R. D'Agostino,et al.  Non‐inferiority trials: design concepts and issues – the encounters of academic consultants in statistics , 2002, Statistics in medicine.

[4]  Stephen L Hillis,et al.  Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis. , 2008, Academic radiology.

[5]  R. F. Wagner,et al.  A Framework for Random-Effects ROC Analysis: Biases with the Bootstrap and Other Variance Estimators , 2009 .

[6]  Hiroshi Nishiyama,et al.  Points to consider on switching between superiority and non-inferiority. , 2006, British journal of clinical pharmacology.

[7]  Sanjay Kaul,et al.  Good Enough: A Primer on the Analysis and Interpretation of Noninferiority Trials , 2006, Annals of Internal Medicine.

[8]  H E Rockette,et al.  Empiric assessment of parameters that affect the design of multireader receiver operating characteristic studies. , 1999, Academic radiology.

[9]  Berkman Sahiner,et al.  Hypothesis testing in noninferiority and equivalence MRMC ROC studies. , 2012, Academic radiology.

[10]  Or Non-Inferiority Clinical Trials to Establish Effectiveness Guidance for Industry , 2016 .

[11]  R. Swensson Unified measurement of observer performance in detecting and localizing target objects on images. , 1996, Medical physics.

[12]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[13]  Nancy A. Obuchowski,et al.  Power estimation for multireader ROC methods an updated and unified approach. , 2011, Academic radiology.

[14]  Nancy A Obuchowski,et al.  Multi-reader ROC studies with split-plot designs: a comparison of statistical methods. , 2012, Academic radiology.

[15]  Dev P Chakraborty,et al.  Observer studies involving detection and localization: modeling, analysis, and validation. , 2004, Medical physics.

[16]  Stephen L Hillis,et al.  Demonstration of multi- and single-reader sample size program for diagnostic studies software , 2015, Medical Imaging.

[17]  S. Hillis A comparison of denominator degrees of freedom methods for multiple observer ROC analysis , 2007, Statistics in medicine.

[18]  Nancy A Obuchowski,et al.  Sample size tables for computer-aided detection studies. , 2011, AJR. American journal of roentgenology.

[19]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[20]  Stephen L Hillis,et al.  Relationship between Roe and Metz simulation model for multireader diagnostic data and Obuchowski‐Rockette model parameters , 2018, Statistics in medicine.

[21]  John F. Hamilton,et al.  A Free Response Approach To The Measurement And Characterization Of Radiographic Observer Performance , 1977, Other Conferences.

[22]  John Eng,et al.  Sample size estimation: a glimpse beyond simple formulas. , 2004, Radiology.

[23]  J. Lewis Switching between superiority and non-inferiority: an introductory note. , 2001, British journal of clinical pharmacology.

[24]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[25]  Russell V. Lenth,et al.  Some Practical Guidelines for Effective Sample Size Determination , 2001 .

[26]  P F Judy,et al.  Detection of noisy visual targets: Models for the effects of spatial uncertainty and signal-to-noise ratio , 1981, Perception & psychophysics.

[27]  Nancy A Obuchowski,et al.  Reducing the number of reader interpretations in MRMC studies. , 2009, Academic radiology.

[28]  N A Obuchowski,et al.  Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. , 1997, Statistics in medicine.

[29]  N A Obuchowski,et al.  Computing Sample Size for Receiver Operating Characteristic Studies , 1994, Investigative radiology.

[30]  G. Murray Switching between superiority and non-inferiority. , 2001, British journal of clinical pharmacology.

[31]  C. Metz,et al.  Visual detection and localization of radiographic images. , 1975, Radiology.

[32]  W C Blackwelder,et al.  "Proving the null hypothesis" in clinical trials. , 1981, Controlled clinical trials.

[33]  Nancy A Obuchowski,et al.  A comparison of the Dorfman–Berbaum–Metz and Obuchowski–Rockette methods for receiver operating characteristic (ROC) data , 2005, Statistics in medicine.

[34]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[35]  Brandon D Gallas,et al.  One-shot estimate of MRMC variance: AUC. , 2006, Academic radiology.

[36]  N A Obuchowski,et al.  Multireader receiver operating characteristic studies: a comparison of study designs. , 1995, Academic radiology.

[37]  J. Hoenig,et al.  Statistical Practice The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis , 2001 .

[38]  K S Berbaum,et al.  Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. , 1998, Academic radiology.

[39]  Stephen L Hillis,et al.  A marginal‐mean ANOVA approach for analyzing multireader multicase radiological imaging data , 2014, Statistics in medicine.