Application of threshold-bias independent analysis to eye-tracking and FROC data.

RATIONALE AND OBJECTIVES Studies of medical image interpretation have focused on either assessing radiologists' performance using, for example, the receiver operating characteristic (ROC) paradigm, or assessing the interpretive process by analyzing their eye-tracking (ET) data. Analysis of ET data has not benefited from threshold-bias independent figures of merit (FOMs) analogous to the area under the receiver operating characteristic (ROC) curve. The aim was to demonstrate the feasibility of such FOMs and to measure the agreement between FOMs derived from free-response ROC (FROC) and ET data. METHODS Eight expert breast radiologists interpreted a case set of 120 two-view mammograms while eye-position data and FROC data were continuously collected during the interpretation interval. Regions that attract prolonged (>800 ms) visual attention were considered to be virtual marks, and ratings based on the dwell and approach-rate (inverse of time-to-hit) were assigned to them. The virtual ratings were used to define threshold-bias independent FOMs in a manner analogous to the area under the trapezoidal alternative FROC (AFROC) curve (0 = worst, 1 = best). Agreement at the case level (0.5 = chance, 1 = perfect) was measured using the jackknife and 95% confidence intervals (CI) for the FOMs and agreement were estimated using the bootstrap. RESULTS The AFROC mark-ratings' FOM was largest at 0.734 (CI 0.65-0.81) followed by the dwell at 0.460 (0.34-0.59) and then by the approach-rate FOM 0.336 (0.25-0.46). The differences between the FROC mark-ratings' FOM and the perceptual FOMs were significant (P < .05). All pairwise agreements were significantly better then chance: ratings vs. dwell 0.707 (0.63-0.88), dwell vs. approach-rate 0.703 (0.60-0.79) and rating vs. approach-rate 0.606 (0.53-0.68). The ratings vs. approach-rate agreement was significantly smaller than the dwell vs. approach-rate agreement (P = .008). CONCLUSIONS Leveraging current methods developed for analyzing observer performance data could complement current ways of analyzing ET data and lead to new insights.

[1]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[2]  Dev P Chakraborty Measuring agreement between rating interpretations and binary clinical interpretations of images: a simulation study of methods for quantifying the clinical relevance of an observer performance paradigm. , 2012, Physics in medicine and biology.

[3]  H L Kundel,et al.  Visual scanning, pattern recognition and decision-making in pulmonary nodule detection. , 1978, Investigative radiology.

[4]  Berkman Sahiner,et al.  Evaluating computer-aided detection algorithms. , 2007, Medical physics.

[5]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[6]  C. Beam,et al.  Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. , 1996, Archives of internal medicine.

[7]  J. Hanley The Robustness of the "Binormal" Assumptions Used in Fitting ROC Curves , 1988, Medical decision making : an international journal of the Society for Medical Decision Making.

[8]  D. Chakraborty Overview of the Receiver Operating Characteristic ( ROC ) Paradigm , 2011 .

[9]  Claudia Mello-Thoms,et al.  Time course of perception and decision making during mammographic interpretation. , 2002, AJR. American journal of roentgenology.

[10]  E. Krupinski,et al.  Computer-displayed eye position as a visual aid to pulmonary nodule interpretation. , 1990, Investigative radiology.

[11]  Elizabeth A. Krupinski,et al.  Recording and analyzing eye-position data using a microcomputer workstation , 1992 .

[12]  John F. Hamilton,et al.  A Free Response Approach To The Measurement And Characterization Of Radiographic Observer Performance , 1977, Other Conferences.

[13]  D. Chakraborty ROC curves predicted by a model of visual search , 2006, Physics in medicine and biology.

[14]  J A Hanley,et al.  Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: an update. , 1997, Academic radiology.

[15]  M F McEntee,et al.  Quantifying the clinical relevance of a laboratory observer performance paradigm. , 2012, The British journal of radiology.

[16]  A. Hillstrom Repetition effects in visual search , 2000, Perception & psychophysics.

[17]  C B Begg,et al.  Biases in the assessment of diagnostic tests. , 1987, Statistics in medicine.

[18]  D P Chakraborty A search model and figure of merit for observer data acquired according to the free-response paradigm. , 2006, Physics in medicine and biology.

[19]  Claudia Mello-Thoms,et al.  Using gaze-tracking data and mixture distribution analysis to support a holistic model for the detection of cancers on mammograms. , 2008, Academic radiology.

[20]  Jules Sumkin,et al.  Effects of lesion conspicuity on visual search in mammogram reading. , 2005, Academic radiology.

[21]  Dev P Chakraborty,et al.  Observer studies involving detection and localization: modeling, analysis, and validation. , 2004, Medical physics.