Validation of Results from Knowledge Discovery: Mass Density as a Predictor of Breast Cancer

The purpose of our study is to identify and quantify the association between high breast mass density and breast malignancy using inductive logic programming (ILP) and conditional probabilities, and validate this association in an independent dataset. We ran our ILP algorithm on 62,219 mammographic abnormalities. We set the Aleph ILP system to generate 10,000 rules per malignant finding with a recall >5% and precision >25%. Aleph reported the best rule for each malignant finding. A total of 80 unique rules were learned. A radiologist reviewed all rules and identified potentially interesting rules. High breast mass density appeared in 24% of the learned rules. We confirmed each interesting rule by calculating the probability of malignancy given each mammographic descriptor. High mass density was the fifth highest ranked predictor. To validate the association between mass density and malignancy in an independent dataset, we collected data from 180 consecutive breast biopsies performed between 2005 and 2007. We created a logistic model with benign or malignant outcome as the dependent variable while controlling for potentially confounding factors. We calculated odds ratios based on dichomotized variables. In our logistic regression model, the independent predictors high breast mass density (OR 6.6, CI 2.5–17.6), irregular mass shape (OR 10.0, CI 3.4–29.5), spiculated mass margin (OR 20.4, CI 1.9–222.8), and subject age (β = 0.09, p < 0.0001) significantly predicted malignancy. Both ILP and conditional probabilities show that high breast mass density is an important adjunct predictor of malignancy, and this association is confirmed in an independent data set of prospectively collected mammographic findings.

[1]  Keith J Dreyer,et al.  Natural language processing using online analytic processing for assessing recommendations in radiology reports. , 2008, Journal of the American College of Radiology : JACR.

[2]  Nada Lavrač,et al.  An Introduction to Inductive Logic Programming , 2001 .

[3]  D. Salas,et al.  Short-term follow-up results in 795 nonpalpable probably benign lesions detected at screening mammography. , 2001, Radiology.

[4]  Hans-Ulrich Prokosch,et al.  Results from Data Mining in a Radiology Department: The Relevance of Data Quality , 2007, MedInfo.

[5]  S Ciatto,et al.  Nonpalpable lesions detected with mammography: review of 512 consecutive cases. , 1987, Radiology.

[6]  Douglas E. Sanders Breast Imaging: Diagnosis and Morphology of Breast Diseases , 1988 .

[7]  L. Bassett,et al.  Diagnostic importance of the radiographic density of noncalcified breast masses: analysis of 91 lesions. , 1991, AJR. American journal of roentgenology.

[8]  F. Hall,et al.  Nonpalpable breast lesions: recommendations for biopsy based on suspicion of carcinoma at mammography. , 1988, Radiology.

[9]  Robert L. Egan,et al.  Breast Imaging: Diagnosis and Morphology of Breast Diseases , 1988 .

[10]  M A Helvie,et al.  Mammographic follow-up of low-suspicion lesions: compliance rate and diagnostic yield. , 1991, Radiology.

[11]  John N. Tsitsiklis,et al.  Introduction to Probability , 2002 .

[12]  X. Varas,et al.  Nonpalpable, probably benign lesions: role of follow-up mammography. , 1992, Radiology.

[13]  K. Cios,et al.  A knowledge discovery approach to diagnosing myocardial perfusion , 2000, IEEE Engineering in Medicine and Biology Magazine.

[14]  L. Liberman,et al.  The breast imaging reporting and data system: positive predictive value of mammographic features and final assessment categories. , 1998, AJR. American journal of roentgenology.

[15]  Emily White,et al.  Lesion and patient characteristics associated with malignancy after a probably benign finding on community practice mammography. , 2008, AJR. American journal of roentgenology.

[16]  C. Floyd,et al.  Breast cancer: prediction with artificial neural network based on BI-RADS standardized lexicon. , 1995, Radiology.

[17]  Charles E. Kahn,et al.  Knowledge Discovery from Structured Mammography Reports Using Inductive Logic Programming , 2005, AMIA.

[18]  E. Sickles Nonpalpable, circumscribed, noncalcified solid breast masses: likelihood of malignancy based on lesion size and age of patient. , 1994, Radiology.

[19]  D. Vanel The American College of Radiology (ACR) Breast Imaging and Reporting Data System (BI-RADS): a step towards a universal radiological language? , 2007, European journal of radiology.