Selection of a rating scale in receiver operating characteristic studies: some remaining issues.

RATIONALE AND OBJECTIVES The aim of this study is to compare the ratings of a group of readers that used two different rating scales in a receiver operating characteristic (ROC) study and to clarify some remaining issues when selecting a rating scale for such studies. MATERIALS AND METHODS We reanalyzed a previously conducted ROC study in which readers used both a 5-point and a 101-point scale to identify abdominal masses in 95 cases. Summary statistics include the distribution of scores by reader for each of the rating scales, the proportion of tied scores when using the 5-point scale that correctly resolved when using the 101-point scale and the proportion of paired normal-abnormal cases where the two rating scales resulted in a different selection of an abnormal case. RESULTS As a group, the readers used 84 of the rating categories when using the 101-point scale but the categories used differed for individual readers. All readers tended to resolve the majority of ties on the 5-point scale in favor of correct decisions and to maintain correct decisions when a more refined scale was used. CONCLUSIONS The reanalysis presented here provides additional evidence that readers in a ROC study can adjust to a 101-point scale and the use of such a refined scale can increase discriminative ability. However, the decision of selecting an appropriate scale should also consider the underlying abnormality in question and relevant clinical considerations.

[1]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[2]  David Gur,et al.  "Binary" and "non-binary" detection tasks: are current performance measures optimal? , 2007, Academic radiology.

[3]  R. F. Wagner,et al.  Study design in the evaluation of breast cancer imaging technologies. , 2000, Academic radiology.

[4]  Berkman Sahiner,et al.  Quasi-continuous and discrete confidence rating scales for observer performance studies: Effects on ROC analysis. , 2007, Academic radiology.

[5]  H E Rockette,et al.  The use of continuous and discrete confidence judgments in receiver operating characteristic studies of diagnostic imaging techniques. , 1992, Investigative radiology.

[6]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[7]  R. F. Wagner,et al.  Continuous versus categorical data for ROC analysis: some quantitative considerations. , 2001, Academic radiology.

[8]  S. Walsh,et al.  Limitations to the robustness of binormal ROC curves: effects of model misspecification and location of decision thresholds on bias, precision, size and power. , 1997, Statistics in medicine.

[9]  C. Metz,et al.  "Proper" Binormal ROC Curves: Theory and Maximum-Likelihood Estimation. , 1999, Journal of mathematical psychology.

[10]  Kevin S Berbaum,et al.  An empirical comparison of discrete ratings and subjective probability ratings. , 2002, Academic radiology.