Multivariate Feature Selection of Image Descriptors Data for Breast Cancer with Computer-Assisted Diagnosis

Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.

[1]  N. Suthanthira Vanitha,et al.  Computer A ided Detection of Tumours in Mammograms , 2014 .

[2]  L. Costaridou,et al.  Texture analysis of tissue surrounding microcalcifications on mammograms for breast cancer diagnosis. , 2007, The British journal of radiology.

[3]  K. Doi,et al.  Current status and future potential of computer-aided diagnosis in medical imaging. , 2005, The British journal of radiology.

[4]  Yaozong Gao,et al.  Learning of Atlas Forest Hierarchy for Automatic Labeling of MR Brain Images , 2014, MLMI.

[5]  Robert N Hoover,et al.  Breast cancer epidemiology according to recognized breast cancer risk factors in the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial Cohort , 2009, BMC Cancer.

[6]  Aboul Ella Hassanien,et al.  Adaptive k-means clustering algorithm for MR breast image segmentation , 2013, Neural Computing and Applications.

[7]  Heng-Da Cheng,et al.  Computer-aided detection and classification of microcalcifications in mammograms: a survey , 2003, Pattern Recognit..

[8]  Shiwen Yu,et al.  An Improved k-Nearest Neighbor Algorithm for Text Categorization , 2003, ArXiv.

[9]  José M. Celaya-Padilla,et al.  Bilateral image subtraction features for multivariate automated classification of breast cancer risk , 2014, Medical Imaging.

[10]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  C. di Maggio State of the art of current modalities for the diagnosis of breast lesions. , 2004, European journal of nuclear medicine and molecular imaging.

[12]  B. Cady,et al.  Mammographic screening: no longer controversial. , 2005, American journal of clinical oncology.

[13]  Johan Staaf,et al.  GOBO: Gene Expression-Based Outcome for Breast Cancer Online , 2011, PloS one.

[14]  Yaozong Gao,et al.  Longitudinal clinical score prediction in Alzheimer's disease with soft-split sparse regression based random forest , 2016, Neurobiology of Aging.

[15]  Francesco Falciani,et al.  GALGO: an R package for multivariate variable selection using genetic algorithms , 2006, Bioinform..

[16]  Antonio Martínez Torteya,et al.  Multivariate predictors of clinically relevant cognitive decay: A wide association study using available data from ADNI , 2012, Alzheimer's & Dementia.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Frank Z. Stanczyk,et al.  Associations of Breast Cancer Risk Factors with Premenopausal Sex Hormones in Women with Very Low Breast Cancer Risk , 2016, International journal of environmental research and public health.

[19]  Ahmedin Jemal,et al.  Annual Report to the Nation on the Status of Cancer, 1975‐2012, featuring the increasing incidence of liver cancer , 2016, Cancer.

[20]  Victor Treviño,et al.  Bilateral Image Subtraction and Multivariate Models for the Automated Triaging of Screening Mammograms , 2015, BioMed research international.

[21]  J. Dheeba,et al.  Computer-aided detection of breast cancer on mammograms: A swarm intelligence optimized wavelet neural network approach , 2014, J. Biomed. Informatics.

[22]  Mats Lambe,et al.  Serum Calcium and the Risk of Breast Cancer: Findings from the Swedish AMORIS Study and a Meta-Analysis of Prospective Studies , 2016, International journal of molecular sciences.

[23]  E. A. Paul,et al.  Breast self-examination and death from breast cancer: a meta-analysis , 2003, British Journal of Cancer.

[24]  Giovanni Luca Christian Masala,et al.  Computer Aided Detection on Mammography , 2008 .

[25]  Yilan Liao,et al.  Temporal Trends in Geographical Variation in Breast Cancer Mortality in China, 1973–2005: An Analysis of Nationwide Surveys on Cause of Death , 2016, International journal of environmental research and public health.

[26]  S. Ciatto,et al.  Comparison of standard reading and computer aided detection (CAD) on a national proficiency test of screening mammography. , 2003, European journal of radiology.

[27]  Ilya Levner,et al.  Feature selection and nearest centroid classification for protein mass spectrometry , 2005, BMC Bioinformatics.

[28]  Dominique Barchiesi,et al.  Numerical Study of Photoacoustic Pressure for Cancer Therapy , 2016 .

[29]  P. K. Sinha,et al.  Pruning of Random Forest classifiers: A survey and future directions , 2012, 2012 International Conference on Data Science & Engineering (ICDSE).

[30]  José M. Celaya-Padilla,et al.  Wide association study of radiological features that predict future knee OA pain: data from the OAI , 2014, Medical Imaging.

[31]  Ann Poulos Diagnostic Breast Imaging: Mammography, Sonography, Magnetic Resonance Imaging, and Interventional Procedures , 2015, Journal of Medical Radiation Sciences.

[32]  Luc Devroye,et al.  On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification , 2010, J. Multivar. Anal..

[33]  Patrick Adams,et al.  The breast cancer conundrum. , 2013, Bulletin of the World Health Organization.

[34]  Anne-Marie Dixon Diagnostic Breast Imaging: Mammography, Sonography, Magnetic Resonance Imaging, and Interventional Procedures, 3rd edition , 2014 .

[35]  Elaf J. Al Taee,et al.  Breast Cancer Diagnosis by CAD , 2014 .

[36]  David Venet,et al.  Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome , 2011, PLoS Comput. Biol..

[37]  Wiro J. Niessen,et al.  Multi-spectral brain tissue segmentation using automatically trained k-Nearest-Neighbor classification , 2007, NeuroImage.

[38]  Guanghua Xiao,et al.  Serum-based biomarker algorithms of neuropsychological functioning , 2012, Alzheimer's & Dementia.

[39]  Miguel Ángel Guevara-López,et al.  An evaluation of image descriptors combined with clinical data for breast cancer diagnosis , 2013, International Journal of Computer Assisted Radiology and Surgery.

[40]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[41]  P. Taylor,et al.  A systematic review of computer-assisted diagnosis in diagnostic cancer imaging. , 2012, European journal of radiology.

[42]  A. Jemal,et al.  Annual report to the nation on the status of cancer, 1975–2001, with a special feature regarding survival , 2004, Cancer.