Applying Machine Learning Algorithms for Early Diagnosis and Prediction of Breast Cancer Risk

With the advancement in the technological age, the deadly diseases threatening human survival also increase at the same pace. Breast cancer being at number two in causing the deaths among women is equally among the most curable type of cancer if diagnosed prior to time. There is an utmost thirst for diagnosis of breast cancer through an automation system in everyday health applications. This paper uses dimensionality reduction technique offered by Weka tool called WrapperSubsetEval on two benchmark cancer datasets of Wisconsin and Portuguese “Breast Cancer Digital Repository” (BCDR), on top four data mining algorithms available in literature. The final experiments carried in MATLAB and Weka demonstrated that Naive Bayes, J48, k-NN and SVM got an improvement in accuracy from 92.6186, 92.9701, 96.1336, 97.891 to 97.0123, 96.8366, 97.3638, 97.9123% in case of Wisconsin dataset and an improvement from 87.4126, 80.4196, 93.7063, 91.6084 to 89.5105, 90.9091, 97.9021, 95.1049% in case of BCDR-D01_Dataset.

[1]  Yong Deng,et al.  A novel feature selection method based on CFS in cancer recognition , 2012, 2012 IEEE 6th International Conference on Systems Biology (ISB).

[2]  Christian Osendorfer,et al.  Minimizing data consumption with sequential online feature selection , 2013, Int. J. Mach. Learn. Cybern..

[3]  Goreti Marreiros,et al.  Using Data Mining Techniques to Support Breast Cancer Diagnosis , 2015, WorldCIST.

[4]  H. Nelson,et al.  Screening for Breast Cancer: Systematic Evidence Review Update for the U. S. Preventive Services Task Force , 2009 .

[5]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  L. Tabár,et al.  Swedish two-county trial: impact of mammographic screening on breast cancer mortality during 3 decades. , 2011, Radiology.

[8]  R. Sankaranarayanan,et al.  Survival in breast cancer: A population‐based study in Bangalore, India , 1995, International journal of cancer.

[9]  Jaime S. Cardoso,et al.  INbreast: toward a full-field digital mammographic database. , 2012, Academic radiology.

[10]  D. Parkin,et al.  Global cancer statistics in the year 2000. , 2001, The Lancet. Oncology.

[11]  Shih-Wei Lin,et al.  Parameter determination and feature selection for C4.5 algorithm using scatter search approach , 2012, Soft Comput..

[12]  Kazuyuki Murase,et al.  A new local search based hybrid genetic algorithm for feature selection , 2011, Neurocomputing.

[13]  Suganthi Jeyasingh,et al.  Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset , 2017, Asian Pacific journal of cancer prevention : APJCP.

[14]  Ewert Bengtsson,et al.  A Feature Set for Cytometry on Digitized Microscopic Images , 2003, Analytical cellular pathology : the journal of the European Society for Analytical Cellular Pathology.

[15]  S. Appavu alias Balamurugan,et al.  A Novel Feature Selection Technique for Improved Survivability Diagnosis of Breast Cancer , 2015 .

[16]  Jasjit S. Suri,et al.  Handbook of Biomedical Image Analysis , 2005 .

[17]  Brian J. d'Auriol,et al.  A novel feature selection method based on normalized mutual information , 2011, Applied Intelligence.

[18]  S. Sasikala,et al.  Multi Filtration Feature Selection (MFFS) to improve discriminatory ability in clinical data set , 2016 .

[19]  Miguel Ángel Guevara-López,et al.  Discovering Mammography-based Machine Learning Classifiers for Breast Cancer Diagnosis , 2012, Journal of Medical Systems.

[20]  Homero Schiabel,et al.  Online Mammographic Images Database for Development and Comparison of CAD Schemes , 2011, Journal of Digital Imaging.