Improving breast cancer classification with mammography, supported on an appropriate variable selection analysis

This work addresses the issue of variable selection within the context of breast cancer classification with mammography. A comprehensive repository of feature vectors was used including a hybrid subset gathering image-based and clinical features. It aimed to gather experimental evidence of variable selection in terms of cardinality, type and find a classification scheme that provides the best performance over the Area Under Receiver Operating Characteristics Curve (AUC) scores using the ranked features subset. We evaluated and classified a total of 300 subsets of features formed by the application of Chi-Square Discretization, Information-Gain, One-Rule and RELIEF methods in association with Feed-Forward Backpropagation Neural Network (FFBP), Support Vector Machine (SVM) and Decision Tree J48 (DTJ48) Machine Learning Algorithms (MLA) for a comparative performance evaluation based on AUC scores. A variable selection analysis was performed for Single-View Ranking and Multi-View Ranking groups of features. Features subsets representing Microcalcifications (MCs), Masses and both MCs and Masses lesions achieved AUC scores of 0.91, 0.954 and 0.934 respectively. Experimental evidence demonstrated that classification performance was improved by combining image-based and clinical features. The most important clinical and image-based features were StromaDistortion and Circularity respectively. Other less important but worth to use due to its consistency were Contrast, Perimeter, Microcalcification, Correlation and Elongation.

[1]  Alan C. Bovik,et al.  Computer-Aided Detection and Diagnosis in Mammography , 2005 .

[2]  R Lederman,et al.  Optimizing parameters for computer-aided diagnosis of microcalcifications at mammography. , 2000, Academic radiology.

[3]  Miguel Ángel Guevara-López,et al.  Discovering Mammography-based Machine Learning Classifiers for Breast Cancer Diagnosis , 2012, Journal of Medical Systems.

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  M. Giger,et al.  Malignant and benign clustered microcalcifications: automated feature analysis and classification. , 1996, Radiology.

[6]  Hamid Soltanian-Zadeh,et al.  Comparison of multiwavelet, wavelet, Haralick, and shape features for microcalcification classification in mammograms , 2004, Pattern Recognit..

[7]  J. M. Pruneda,et al.  Computer-aided mammographic screening for spiculated lesions. , 1994, Radiology.

[8]  Isabelle Guyon,et al.  An Introduction to Feature Extraction , 2006, Feature Extraction.

[9]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[10]  William H. Press,et al.  Numerical recipes in C , 2002 .

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  N Karssemeijer,et al.  Automated classification of clustered microcalcifications into malignant and benign types. , 2000, Medical physics.

[13]  Berkman Sahiner,et al.  Computer-aided detection of breast masses on full field digital mammograms. , 2005, Medical physics.

[14]  Belén Melián-Batista,et al.  Solving feature subset selection problem by a Parallel Scatter Search , 2006, Eur. J. Oper. Res..

[15]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[16]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[17]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[18]  Jenq-Neng Hwang,et al.  Introduction to Neural Networks for Signal Processing , 2001, Handbook of Neural Network Signal Processing.

[19]  L. Bonneux,et al.  Health statistics - Atlas on mortality in the European Union: 2009 edition , 2009 .