Inter- and intraradiologist variability in the BI-RADS assessment and breast density categories for screening mammograms.

OBJECTIVE The aim of this study was to evaluate reader variability in screening mammograms according to the American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) assessment and breast density categories. METHODS A stratified random sample of 100 mammograms was selected from a population-based breast cancer screening programme in Barcelona, Spain: 13 histopathologically confirmed breast cancers and 51 with true-negative and 36 with false-positive results. 21 expert radiologists from radiological units of breast cancer screening programmes in Catalonia, Spain, reviewed the mammography images twice within a 6-month interval. The readers described each mammography using BI-RADS assessment and breast density categories. Inter- and intraradiologist agreement was assessed using percentage of concordance and the kappa (κ) statistic. RESULTS Fair interobserver agreement was observed for the BI-RADS assessment [κ=0.37, 95% confidence interval (CI) 0.36-0.38]. When the categories were collapsed in terms of whether additional evaluation was required (Categories III, 0, IV, V) or not (I and II), moderate agreement was found (κ=0.53, 95% CI 0.52-0.54). Intra-observer agreement for BI-RADS assessment was moderate using all categories (κ=0.53, 95% CI 0.50-0.55) and substantial on recall (κ=0.66, 95% CI 0.63-0.70). Regarding breast density, inter- and intraradiologist agreement was substantial (κ=0.73, 95% CI 0.72-0.74 and κ=0.69, 95% CI 0.68-0.70, respectively). CONCLUSION We observed a substantial intra-observer agreement in the BI-RADS assessment but only moderate interobserver agreement. Both inter- and intra-observer agreement in mammographic interpretation of breast density was substantial. Advances in knowledge Educational efforts should be made to decrease radiologists' variability in BI-RADS assessment interpretation in population-based breast screening programmes.

[1]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[2]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[3]  M. Hill,et al.  European Guidelines for Quality Assurance in Mammography Screening , 1993 .

[4]  J. Elmore,et al.  Variability in radiologists' interpretations of mammograms. , 1994, The New England journal of medicine.

[5]  K. Kerlikowske,et al.  Variability and accuracy in mammographic interpretation using the American College of Radiology Breast Imaging Reporting and Data System. , 1998, Journal of the National Cancer Institute.

[6]  S. Orel,et al.  BI-RADS categorization as a predictor of malignancy. , 1999, Radiology.

[7]  G. Meijer GLOBOCAN 1: Cancer Incidence and Mortality Worldwide. , 2000 .

[8]  P. Langenberg,et al.  Breast Imaging Reporting and Data System: inter- and intraobserver variability in feature analysis and final assessment. , 2000, AJR. American journal of roentgenology.

[9]  S. Moss,et al.  Effectiveness and cost-effectiveness of double reading of mammograms in breast cancer screening: findings of a systematic review. , 2001, Breast.

[10]  Emily White,et al.  Use of the American College of Radiology BI-RADS guidelines by community radiologists: concordance of assessments and recommendations assigned to screening mammograms. , 2002, AJR. American journal of roentgenology.

[11]  Melanie Pinet,et al.  Increase in cancer detection and recall rates with independent double interpretation of screening mammography. , 2003, AJR. American journal of roentgenology.

[12]  G. Rennert,et al.  International comparison of performance measures for screening mammography: can it be done? , 2004, Journal of medical screening.

[13]  S. Ciatto,et al.  Categorizing breast mammographic density: intra- and interobserver reproducibility of BI-RADS density categories. , 2005, Breast.

[14]  Karla Kerlikowske,et al.  Comparing the performance of mammography screening in the USA and the UK , 2005, Journal of medical screening.

[15]  V. McCormack,et al.  Breast Density and Parenchymal Patterns as Markers of Breast Cancer Risk: A Meta-analysis , 2006, Cancer Epidemiology Biomarkers & Prevention.

[16]  M. Mainiero,et al.  BI-RADS lexicon for US and mammography: interobserver variability and positive predictive value. , 2006, Radiology.

[17]  N Houssami,et al.  Reader variability in reporting breast imaging according to BI-RADS assessment categories (the Florence experience). , 2006, Breast.

[18]  M. Eijkemans,et al.  Mammography: interobserver variability in breast density assessment. , 2007, Breast.

[19]  X. Castells,et al.  Association between Radiologists' Experience and Accuracy in Interpreting Screening Mammograms , 2008, BMC health services research.

[20]  A. Jemal,et al.  Global Cancer Statistics , 2011 .