Modeling rater diagnostic skills in binary classification processes

Many disease diagnoses involve subjective judgments by qualified raters. For example, through the inspection of a mammogram, MRI, or ultrasound image, the clinician himself becomes part of the measuring instrument. To reduce diagnostic errors and improve the quality of diagnoses, it is necessary to assess raters' diagnostic skills and to improve their skills over time. This paper focuses on a subjective binary classification process, proposing a hierarchical model linking data on rater opinions with patient true disease-development outcomes. The model allows for the quantification of the effects of rater diagnostic skills (bias and magnifier) and patient latent disease severity on the rating results. A Bayesian Markov chain Monte Carlo (MCMC) algorithm is developed to estimate these parameters. Linking to patient true disease outcomes, the rater-specific sensitivity and specificity can be estimated using MCMC samples. Cost theory is used to identify poor- and strong-performing raters and to guide adjustment of rater bias and diagnostic magnifier to improve the rating performance. Furthermore, diagnostic magnifier is shown as a key parameter to present a rater's diagnostic ability because a rater with a larger diagnostic magnifier has a uniformly better receiver operating characteristic (ROC) curve when varying the value of diagnostic bias. A simulation study is conducted to evaluate the proposed methods, and the methods are illustrated with a mammography example.

[1]  J. Goo,et al.  Receiver Operating Characteristic (ROC) Curve: Practical Review for Radiologists , 2004, Korean journal of radiology.

[2]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[3]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[4]  E. Keeler,et al.  Primer on certain elements of medical decision making. , 1975, The New England journal of medicine.

[5]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[6]  M. Bronskill,et al.  Receiver Operator characteristic (ROC) Analysis without Truth , 1990, Medical decision making : an international journal of the Society for Medical Decision Making.

[7]  Emily F Conant,et al.  Association of volume and volume-independent factors with accuracy in screening mammogram interpretation. , 2003, Journal of the National Cancer Institute.

[8]  N A Obuchowski,et al.  Assessing physicians' accuracy in diagnosing paediatric patients with acute abdominal pain: measuring accuracy for multiple diseases , 2001, Statistics in medicine.

[9]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[10]  A. Akobeng,et al.  Understanding diagnostic tests 3: receiver operating characteristic curves , 2007, Acta paediatrica.

[11]  D. Rindskopf,et al.  The value of latent class analysis in medical diagnosis. , 1986, Statistics in medicine.

[12]  N D Holmquist,et al.  Variability in classification of carcinoma in situ of the uterine cervix. , 1967, Archives of pathology.

[13]  D. Mossman Three-way ROCs , 1999, Medical decision making : an international journal of the Society for Medical Decision Making.

[14]  W. Gilks,et al.  Adaptive Rejection Metropolis Sampling Within Gibbs Sampling , 1995 .

[15]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[16]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[17]  L. Joseph,et al.  Bayesian Approaches to Modeling the Conditional Dependence Between Multiple Diagnostic Tests , 2001, Biometrics.

[18]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[19]  J. Epstein,et al.  Interobserver reproducibility of Gleason grading of prostatic carcinoma: general pathologist. , 2001, Human pathology.

[20]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[22]  M. Tan,et al.  Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. , 1996, Biometrics.

[23]  R. Hambleton,et al.  Fundamentals of Item Response Theory , 1991 .

[24]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[25]  F. Baker,et al.  Item response theory : parameter estimation techniques , 1993 .

[26]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[27]  D. Dorfman,et al.  Maximum likelihood estimation of parameters of signal detection theory—A direct solution , 1968, Psychometrika.

[28]  Frank B. Baker,et al.  Item Response Theory : Parameter Estimation Techniques, Second Edition , 2004 .

[29]  C. Gatsonis,et al.  On ROC analysis with nonbinary reference standard , 2012, Biometrical journal. Biometrische Zeitschrift.

[30]  Peter Congdon,et al.  Bayesian Spatial Statistical Modeling , 2014 .

[31]  P. Albert,et al.  Estimating diagnostic accuracy without a gold standard: A continued controversy , 2016, Journal of Biopharmaceutical Statistics.

[32]  Nancy A Obuchowski,et al.  An ROC‐type measure of diagnostic accuracy when the gold standard is continuous‐scale , 2006, Statistics in medicine.

[33]  David J. Bartholomew,et al.  Latent Variable Models and Factor Analysis: A Unified Approach , 2011 .

[34]  G. Pennello,et al.  Generalized linear mixed models for multi-reader multi-case studies of diagnostic tests , 2017, Statistical methods in medical research.

[35]  Constantine A Gatsonis,et al.  Hierarchical models for ROC curve summary measures: Design and analysis of multi‐reader, multi‐modality studies of medical tests , 2008, Statistics in medicine.

[36]  Focused Professional Performance Evaluation of a Radiologist--a Centers for Medicare and Medicaid Services and Joint Commission Requirement. , 2016, Current problems in diagnostic radiology.