Evaluation of mammographic density patterns: reproducibility and concordance among scales

BackgroundIncreased mammographic breast density is a moderate risk factor for breast cancer. Different scales have been proposed for classifying mammographic density. This study sought to assess intra-rater agreement for the most widely used scales (Wolfe, Tabár, BI-RADS and Boyd) and compare them in terms of classifying mammograms as high- or low-density.MethodsThe study covered 3572 mammograms drawn from women included in the DDM-Spain study, carried-out in seven Spanish Autonomous Regions. Each mammogram was read by an expert radiologist and classified using the Wolfe, Tabár, BI-RADS and Boyd scales. In addition, 375 mammograms randomly selected were read a second time to estimate intra-rater agreement for each scale using the kappa statistic. Owing to the ordinal nature of the scales, weighted kappa was computed. The entire set of mammograms (3572) was used to calculate agreement among the different scales in classifying high/low-density patterns, with the kappa statistic being computed on a pair-wise basis. High density was defined as follows: percentage of dense tissue greater than 50% for the Boyd, "heterogeneously dense and extremely dense" categories for the BI-RADS, categories P2 and DY for the Wolfe, and categories IV and V for the Tabár scales.ResultsThere was good agreement between the first and second reading, with weighted kappa values of 0.84 for Wolfe, 0.71 for Tabár, 0.90 for BI-RADS, and 0.92 for Boyd scale. Furthermore, there was substantial agreement among the different scales in classifying high- versus low-density patterns. Agreement was almost perfect between the quantitative scales, Boyd and BI-RADS, and good for those based on the observed pattern, i.e., Tabár and Wolfe (kappa 0.81). Agreement was lower when comparing a pattern-based (Wolfe or Tabár) versus a quantitative-based (BI-RADS or Boyd) scale. Moreover, the Wolfe and Tabár scales classified more mammograms in the high-risk group, 46.61 and 37.32% respectively, while this percentage was lower for the quantitative scales (21.89% for BI-RADS and 21.86% for Boyd).ConclusionsVisual scales of mammographic density show a high reproducibility when appropriate training is provided. Their ability to distinguish between high and low risk render them useful for routine use by breast cancer screening programs. Quantitative-based scales are more specific than pattern-based scales in classifying populations in the high-risk group.

[1]  Helen Warren-Forward,et al.  Reproducibility of visual assessment on mammographic density , 2008, Breast Cancer Research and Treatment.

[2]  A. Paterson,et al.  Mammographic breast density as an intermediate phenotype for breast cancer. , 2005, The Lancet. Oncology.

[3]  K. Kerlikowske,et al.  Variability and accuracy in mammographic interpretation using the American College of Radiology Breast Imaging Reporting and Data System. , 1998, Journal of the National Cancer Institute.

[4]  J. Wolfe,et al.  Mammographic features and breast cancer risk: effects with time, age, and menopause status. , 1995, Journal of the National Cancer Institute.

[5]  J. Wolfe Breast patterns as an index of risk for developing breast cancer. , 1976, AJR. American journal of roentgenology.

[6]  P. Toniolo,et al.  Reproducibility of Wolfe's classification of mammographic parenchymal patterns. , 1992, Preventive medicine.

[7]  L. Tabár,et al.  The Tabár classification of mammographic parenchymal patterns. , 1997, European journal of radiology.

[8]  A Shantini,et al.  Quantitative assessment of breast density from digitized mammograms into Tabar's patterns , 2006, Physics in medicine and biology.

[9]  Michael Eduardo Reichenheim,et al.  Confidence Intervals for the Kappa Statistic , 2004 .

[10]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[11]  Jennifer A Harvey,et al.  Quantitative assessment of mammographic breast density: relationship with breast cancer risk. , 2004, Radiology.

[12]  N. Boyd,et al.  Quantitative evaluation of mammographic densities: a comparison of methods of assessment. , 1995, European journal of cancer prevention : the official journal of the European Cancer Prevention Organisation.

[13]  A. Miller,et al.  Quantitative classification of mammographic densities and breast cancer risk: results from the Canadian National Breast Screening Study. , 1995, Journal of the National Cancer Institute.

[14]  David Gur,et al.  Computerized assessment of tissue composition on digitized mammograms. , 2002, Academic radiology.

[15]  M Moskowitz,et al.  Mammographic patterns as markers for high-risk benign breast disease and incident cancers. , 1980, Radiology.

[16]  E. Fishell,et al.  Bias and the association of mammographic parenchymal patterns with breast cancer. , 1982, British Journal of Cancer.

[17]  Karla Kerlikowske,et al.  Prevention of breast cancer in postmenopausal women: approaches to estimating and reducing risk. , 2009, Journal of the National Cancer Institute.

[18]  D. Thompson,et al.  Reproducibility and consistency in classification of breast parenchymal patterns. , 1983, AJR. American journal of roentgenology.

[19]  N. Boyd,et al.  Mammographic signs of potential relevance to breast cancer risk: the agreement of radiologists' classification , 1996, European journal of cancer prevention : the official journal of the European Cancer Prevention Organisation.

[20]  P. Langenberg,et al.  Breast Imaging Reporting and Data System: inter- and intraobserver variability in feature analysis and final assessment. , 2000, AJR. American journal of roentgenology.

[21]  N. Boyd,et al.  Analysis of mammographic density and breast cancer risk from digitized mammograms. , 1998, Radiographics : a review publication of the Radiological Society of North America, Inc.