Using Data Mining Techniques to Support Breast Cancer Diagnosis

More than ever, in breast cancer research, many computer aided diagnostic systems have been developed in order to reduce false-positives diagnosis. In this work, we present a data mining based approach which might support oncologists in the process of breast cancer classification and diagnose. A reliable database with 410 images was used containing microcalcifications, masses and also normal tissue findings. We applied two feature extraction techniques, specifically the gray level co-occurrence matrix and the gray level run length matrix, and for classification purposes several data mining classifiers were also used. The results revealed great percentages of positive predicted value (approximately 70%) and very good accuracy values in terms of distinction of mammographic findings (>65%) and classification of BI-RADS® scale (>75%). The best predictive method and the best performance on the distinction of microcalcifications found was the Random Forest classifier.

[1]  R. Prevete,et al.  The MAGIC-5 Project: medical applications on a GRID infrastructure connection , 2004, IEEE Symposium Conference Record Nuclear Science 2004..

[2]  M. Lacquement,et al.  Positive predictive value of the Breast Imaging Reporting and Data System. , 1999, Journal of the American College of Surgeons.

[3]  N. Boyd,et al.  Breast tissue composition and susceptibility to breast cancer. , 2010, Journal of the National Cancer Institute.

[4]  Saroj Kumar Lenka,et al.  Texture-based features for classification of mammograms using decision tree , 2012, Neural Computing and Applications.

[5]  S. Obenauer,et al.  Applications and literature review of the BI-RADS classification , 2005, European Radiology.

[6]  C. H. Chen,et al.  Handbook of Pattern Recognition and Computer Vision , 1993 .

[7]  Samir Brahim Belhaouari,et al.  A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation , 2012, Comput. Biol. Medicine.

[8]  Jasjit S. Suri,et al.  Handbook of Biomedical Image Analysis , 2005 .

[9]  N. Boyd,et al.  Mammographic density and breast cancer risk: current understanding and future prospects , 2011, Breast Cancer Research.

[10]  Joana Cristina,et al.  Pre-CADs in Breast Cancer , 2013 .

[11]  Jaime S. Cardoso,et al.  INbreast: toward a full-field digital mammographic database. , 2012, Academic radiology.

[12]  Miguel Ángel Guevara-López,et al.  Discovering Mammography-based Machine Learning Classifiers for Breast Cancer Diagnosis , 2012, Journal of Medical Systems.

[13]  Piernicola Oliva,et al.  The CALMA project , 2001 .

[14]  J. Ferlay,et al.  Global estimates of cancer prevalence for 27 sites in the adult population in 2008 , 2013, International journal of cancer.

[15]  Homero Schiabel,et al.  A CADx scheme in mammography: considerations on a novel approach , 2013 .

[16]  Oscar Déniz-Suárez,et al.  Automatic breast parenchymal density classification integrated into a CADe system , 2011, International Journal of Computer Assisted Radiology and Surgery.

[17]  Andreia Malucelli,et al.  Classificação de microáreas de risco com uso de mineraçãode dados , 2010 .

[18]  Timothy J Wilt,et al.  Screening for breast cancer: U.S. Preventive Services Task Force recommendation statement. , 2009, Annals of internal medicine.

[19]  Emanuele Menegatti,et al.  Different Approaches for Extracting Information from the Co-Occurrence Matrix , 2013, PloS one.

[20]  Celine M Vachon,et al.  Breast density and breast cancer risk: a practical review. , 2014, Mayo Clinic proceedings.

[21]  Andreia Malucelli,et al.  Classification of risk micro-areas using data mining. , 2010, Revista de Saúde Pública.

[22]  Karla Kerlikowske,et al.  Relationship between mammographic density and breast cancer death in the Breast Cancer Surveillance Consortium. , 2012, Journal of the National Cancer Institute.