Measuring modality ordering consistency of observer performance paradigms

Two observer performance paradigms applied to the same modalities, readers and cases are said to order the modalities consistently if both confirm the same sign (positive or negative) of the figure of merit difference. The aim of this work was to develop a modality ordering consistency measure. The paradigms considered were receiver operating characteristic (ROC) and jackknife alternative free-response ROC (JAFROC). Clinical FROC data from a previous study was used. Using the highest rating method ROC ratings were inferred from FROC ratings. JAFROC analyses of the FROC data and Dorfman-Berbaum-Metz multiple-reader multiple-case (DBM-MRMC) analysis of the inferred ROC data showed significant and consistent differences in the two figures of merit. Additionally 2000 bootstrap data sets were sampled and analyzed by JAFROC and DBM-MRMC. It was found that a positive JAFROC figure of merit difference was 101 times more likely when the ROC difference was positive than when the ROC difference was negative (odds ratio = 101). Valid modality ordering consistency (or inconsistency) claims are possible only when both figures of merit differences are statistically significant. For those bootstraps where both JAFROC and ROC yielded significant differences there were no inconsistent orderings. The effect of artificially degrading JAFROC performance was investigated. It was found that the odds ratio was more sensitive to the degradation. The results in this work are likely to be optimistic. A more realistic test of modality ordering consistency would require two separate studies (FROC and ROC) using the same readers and cases.

[1]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[2]  Dev P Chakraborty,et al.  Observer studies involving detection and localization: modeling, analysis, and validation. , 2004, Medical physics.

[3]  R. G. Fraser,et al.  Digital and conventional chest imaging: a modified ROC study of observer performance using simulated nodules. , 1986, Radiology.

[4]  Frank W. Samuelson,et al.  Comparing signal-based and case-based methodologies for CAD assessment in a detection task , 2008, SPIE Medical Imaging.

[5]  H. Rockette,et al.  Performance assessments of diagnostic systems under the FROC paradigm: experimental, analytical, and results interpretation issues. , 2009, Academic radiology.

[6]  Hilde Bosmans,et al.  Evaluation of clinical image processing algorithms used in digital mammography. , 2009, Medical physics.

[7]  Jan-Willem Strijbos,et al.  Content analysis: What are they talking about? , 2006, Comput. Educ..

[8]  Frank W. Samuelson,et al.  Non-localization and localization ROC analyses using clinically based scoring , 2009, Medical Imaging.

[9]  H. Ishwaran,et al.  A general class of hierarchical ordinal regression models with applications to correlated roc analysis , 2000 .

[10]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[11]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[12]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[13]  David Gur,et al.  A permutation test sensitive to differences in areas for comparing ROC curves from a paired design , 2005, Statistics in medicine.

[14]  Charles E Metz,et al.  Receiver operating characteristic analysis: a tool for the quantitative evaluation of observer performance and imaging systems. , 2006, Journal of the American College of Radiology : JACR.

[15]  James P. Egan,et al.  Operating Characteristics, Signal Detectability, and the Method of Free Response , 1961 .

[16]  Dev P Chakraborty Counterpoint to "Performance assessment of diagnostic systems under the FROC paradigm" by Gur and Rockette. , 2009, Academic radiology.

[17]  H E Rockette,et al.  Empiric assessment of parameters that affect the design of multireader receiver operating characteristic studies. , 1999, Academic radiology.

[18]  Frank W. Samuelson,et al.  Comparing image detection algorithms using resampling , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[19]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[20]  Dennis G. Fryback,et al.  The Efficacy of Diagnostic Imaging , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[21]  Stephen L Hillis,et al.  Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis. , 2008, Academic radiology.

[22]  John F. Hamilton,et al.  A Free Response Approach To The Measurement And Characterization Of Radiographic Observer Performance , 1977, Other Conferences.

[23]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[24]  C E Metz,et al.  Some practical issues of experimental design and data analysis in radiological ROC studies. , 1989, Investigative radiology.

[25]  Dev P Chakraborty,et al.  Validation and statistical power comparison of methods for analyzing free-response observer performance studies. , 2008, Academic radiology.

[26]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[27]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[28]  C M Rutter,et al.  A hierarchical regression approach to meta‐analysis of diagnostic test accuracy evaluations , 2001, Statistics in medicine.