Statistical approaches for modeling radiologists' interpretive performance.

Although much research has been conducted to understand the influence of interpretive volume on radiologists' performance of mammography interpretation, the published literature has been unable to achieve consensus on the volume standards required for optimal mammography accuracy. One potential contributing factor is that studies have used different statistical approaches to address the same underlying scientific question. Such studies have relied on multiple mammography interpretations from a sample of radiologists; thus, an important statistical issue is appropriately accounting for dependence, or correlation, among interpretations made by (or clustered within) the same radiologist. The aim of this review is to increase awareness about differences between statistical approaches used to analyze clustered data. Statistical frameworks commonly used to model binary measures of interpretive performance are reviewed, focusing on two broad classes of regression frameworks: marginal and conditional models. Although both frameworks account for dependence in clustered data, the interpretations of their parameters differ; hence, the choice of statistical framework may (implicitly) dictate the scientific question being addressed. Additional statistical issues that influence estimation and inference are also discussed, together with their potential impact on the scientific interpretation of the analysis. This work was motivated by ongoing research being conducted by the National Cancer Institute's Breast Cancer Surveillance Consortium; however, the ideas are relevant to a broad range of settings in which researchers seek to identify and understand sources of variability in clustered binary outcomes.

[1]  J. Elmore,et al.  Radiologist characteristics associated with interpretive performance of diagnostic mammography. , 2007, Journal of the National Cancer Institute.

[2]  D. Miglioretti,et al.  Marginal modeling of multilevel binary data with time-varying covariates. , 2004, Biostatistics.

[3]  J. Kalbfleisch,et al.  A Comparison of Cluster-Specific and Population-Averaged Approaches for Analyzing Correlated Binary Data , 1991 .

[4]  C. D'Orsi,et al.  Accuracy of screening mammography interpretation by characteristics of radiologists. , 2004, Journal of the National Cancer Institute.

[5]  D. Miglioretti,et al.  Physician predictors of mammographic accuracy. , 2005, Journal of the National Cancer Institute.

[6]  N A Obuchowski,et al.  Multireader, multimodality receiver operating characteristic curve studies: hypothesis testing and sample size estimation using an analysis of variance approach with dependent observations. , 1995, Academic radiology.

[7]  Jessica W T Leung,et al.  Performance benchmarks for diagnostic mammography. , 2005, Radiology.

[8]  Patrick J Heagerty,et al.  Marginal modeling of nonnested multilevel data using standard software. , 2006, American journal of epidemiology.

[9]  P. Albert,et al.  Models for longitudinal data: a generalized estimating equation approach. , 1988, Biometrics.

[10]  K. Kerlikowske,et al.  Breast Cancer Surveillance Consortium: a national mammography screening and outcomes database. , 1997, AJR. American journal of roentgenology.

[11]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[12]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[13]  Xiao Song,et al.  A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data. , 2005, Biostatistics.

[14]  P. Diggle,et al.  Analysis of Longitudinal Data , 2003 .

[15]  Charles E. McCulloch,et al.  Separating between‐ and within‐cluster covariate effects by using conditional and partitioning methods , 2006 .

[16]  E A Sickles,et al.  Standardized abnormal interpretation and cancer detection ratios to assess reading volume and reader performance in a breast screening program. , 2000, Radiology.

[17]  M. Piedmonte,et al.  On some small sample properties of generalized estimating equationEstimates for multivariate dichotomous outcomes , 1992 .

[18]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[19]  Thomas A. Louis,et al.  Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function , 2003 .

[20]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[21]  Jessica W T Leung,et al.  Performance parameters for screening and diagnostic mammography in a community practice: are there differences between specialists and general radiologists? , 2007, AJR. American journal of roentgenology.

[22]  N M Laird,et al.  Missing data in longitudinal studies. , 1988, Statistics in medicine.

[23]  N. Obuchowski,et al.  Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: An anova approach with dependent observations , 1995 .

[24]  J. Elmore,et al.  Background Methods Results , 2009 .

[25]  Karla Kerlikowske,et al.  Performance benchmarks for screening mammography. , 2006, Radiology.

[26]  P. Heagerty Marginally Specified Logistic‐Normal Models for Longitudinal Binary Data , 1999, Biometrics.

[27]  P. Heagerty,et al.  Misspecified maximum likelihood estimates and generalised linear mixed models , 2001 .

[28]  Pranab Kumar Sen,et al.  Within‐cluster resampling , 2001 .

[29]  Scott L. Zeger,et al.  Marginalized Multilevel Models and Likelihood Inference , 2000 .

[30]  Andrew Page,et al.  Cancer detection and mammogram volume of radiologists in a population-based screening programme. , 2006, Breast.

[31]  J. Pearl,et al.  Confounding and Collapsibility in Causal Inference , 1999 .

[32]  K S Berbaum,et al.  Monte Carlo validation of a multireader method for receiver operating characteristic discrete rating data: factorial experimental design. , 1998, Academic radiology.

[33]  R. Shumak,et al.  Organized breast screening programs in Canada: effect of radiologist reading volumes on outcomes. , 2006, Radiology.

[34]  D. Wolverton,et al.  Performance parameters for screening and diagnostic mammography: specialist and general radiologists. , 2002, Radiology.

[35]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[36]  T. Derouen,et al.  A Covariance Estimator for GEE with Improved Small‐Sample Properties , 2001, Biometrics.

[37]  J. Kalbfleisch,et al.  Between- and within-cluster covariate effects in the analysis of clustered data. , 1998, Biometrics.

[38]  J. Brisson,et al.  Volume of screening mammography and performance in the Quebec population-based Breast Cancer Screening Program , 2005, Canadian Medical Association Journal.

[39]  Somnath Datta,et al.  Marginal Analyses of Clustered Data When Cluster Size Is Informative , 2003, Biometrics.

[40]  J. Elmore,et al.  Screening mammograms by community radiologists: variability in false-positive rates. , 2002, Journal of the National Cancer Institute.

[41]  T. Louis,et al.  Marginalized Binary Mixed‐Effects Models with Covariate‐Dependent Random Effects and Likelihood Inference , 2004, Biometrics.

[42]  Paul J. Rathouz,et al.  FIXED EFFECTS MODELS FOR LONGITUDINAL BINARY DATA WITH DROP-OUTS MISSING AT RANDOM , 2004 .

[43]  R. F. Wagner,et al.  Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. , 2004, Academic radiology.

[44]  K. Berbaum,et al.  Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. , 1992, Investigative radiology.

[45]  W. Barlow,et al.  Current medicolegal and confidentiality issues in large, multicenter research programs. , 2000, American journal of epidemiology.

[46]  N. Breslow,et al.  Statistical methods in cancer research. Vol. 1. The analysis of case-control studies. , 1981 .