Evaluation of Pseudoreader Study Designs to Estimate Observer Performance Results as an Alternative to Fully Crossed, Multireader, Multicase Studies.

RATIONALE AND OBJECTIVES To examine the ability of a pseudoreader study design to estimate the observer performance obtained using a traditional fully crossed, multireader, multicase (MRMC) study. MATERIALS AND METHODS A 10-reader MRMC study with 20 computed tomography datasets was designed to measure observer performance on four novel noise reduction methods. This study served as the foundation for the empirical evaluation of three different pseudoreader designs, each of which used a similar bootstrap approach for generating 2000 realizations from the fully crossed study. Our three approaches to generating a pseudoreader varied in the degree to which reader performance was matched and integrated into the pseudoreader design. One randomly selected simulation was selected as a "mock study" to represent a hypothetical, prospective implementation of the design. RESULTS Using the traditional fully crossed design, figures of merit) (95% CIs) for the four noise reductions methods were 68.2 (55.5-81.0), 69.6 (58.4-80.8), 70.8 (60.2-81.4), and 70.9 (60.4-81.3), respectively. When radiologists' performances on the fourth noise reduction method were used to pair readers during the mock study, there was strong agreement in the estimated figures of merits with estimates using the pseudoreader design being within ±3% of the fully crossed design. CONCLUSION Fully crossed MRMC studies require significant investment in resources and time, often resulting in delayed implementation or minimal human testing before dissemination. The pseudoreader approach accelerates study conduct by combining readers judiciously and was found to provide comparable results to the traditional fully crossed design by making strong assumptions about exchangeability of the readers.

[1]  Judy Yee,et al.  Can radiologist training and testing ensure high performance in CT colonography? Lessons From the National CT Colonography Trial. , 2010, AJR. American journal of roentgenology.

[2]  Nancy A Obuchowski,et al.  Multi-reader ROC studies with split-plot designs: a comparison of statistical methods. , 2012, Academic radiology.

[3]  S. Hillis A comparison of denominator degrees of freedom methods for multiple observer ROC analysis , 2007, Statistics in medicine.

[4]  Weijie Chen,et al.  Paired split-plot designs of multireader multicase studies , 2018, Journal of medical imaging.

[5]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[6]  Hong-Jun Yoon,et al.  JAFROC analysis revisited: figure-of-merit considerations for human observer studies , 2009, Medical Imaging.

[7]  Peter Hogg,et al.  The Value of Observer Performance Studies in Dose Optimization: A Focus on Free-Response Receiver Operating Characteristic Methods* , 2013, The Journal of Nuclear Medicine Technology.

[8]  Baiyu Chen,et al.  Low‐dose CT for the detection and classification of metastatic liver lesions: Results of the 2016 Low Dose CT Grand Challenge , 2017, Medical physics.

[9]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[10]  Shuai Leng,et al.  Observer Performance with Varying Radiation Dose and Reconstruction Methods for Detection of Hepatic Metastases. , 2018, Radiology.

[11]  M. Shiung,et al.  Development and Validation of a Practical Lower-Dose-Simulation Tool for Optimizing Computed Tomography Scan Protocols , 2012, Journal of computer assisted tomography.

[12]  D. Chakraborty,et al.  Recent developments in imaging system assessment methodology, FROC analysis and the search model. , 2011, Nuclear instruments & methods in physics research. Section A, Accelerators, spectrometers, detectors and associated equipment.