Breast Cancer Diagnosis Using a Hybrid Genetic Algorithm for Feature Selection Based on Mutual Information

Feature Selection is the process of selecting a subset of relevant features (i.e. predictors) for use in the construction of predictive models. This paper proposes a hybrid feature selection approach to breast cancer diagnosis which combines a Genetic Algorithm (GA) with Mutual Information (MI) for selecting the best combination of cancer predictors, with maximal discriminative capability. The selected features are then input into a classifier to predict whether a patient has breast cancer. Using a publicly available breast cancer dataset, experiments were performed to evaluate the performance of the Genetic Algorithm based on the Mutual Information approach with two different machine learning classifiers, namely the k-Nearest Neighbor (K-NN), and Support vector machine (SVM), each tuned using different distance measures and kernel functions, respectively. The results revealed that the proposed hybrid approach is highly accurate for predicting breast cancer, and it is very promising for predicting other cancers using clinical data.

[1]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[4]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[5]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[6]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Mohammad Reza Daliri,et al.  A Hybrid Automatic System for the Diagnosis of Lung Cancer Based on Genetic Algorithm and Fuzzy Extreme Learning Machines , 2012, Journal of Medical Systems.

[8]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[9]  Driss Aboutajdine,et al.  A Powerful Feature Selection approach based on Mutual Information , 2008 .

[10]  P. Pudil,et al.  of Techniques for Large-Scale Feature Selection , 1994 .

[11]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Kemal Polat,et al.  Principles component analysis, fuzzy weighting pre-processing and artificial immune recognition system based diagnostic system for diagnosis of lung cancer , 2008, Expert Syst. Appl..

[13]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Engin Avci,et al.  A New Expert System for Diagnosis of Lung Cancer: GDA—LS_SVM , 2012, Journal of Medical Systems.

[15]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[16]  Yiming Wu,et al.  The Effect of Artificial Neural Network Model Combined with Six Tumor Markers in Auxiliary Diagnosis of Lung Cancer , 2012, Journal of Medical Systems.

[17]  Xuefeng Bruce Ling,et al.  Multiclass cancer classification and biomarker discovery using GA-based algorithms , 2005, Bioinform..

[18]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[19]  Giovanni Acampora,et al.  Prediction of Pathological Stage in Patients with Prostate Cancer: A Neuro-Fuzzy Model , 2016, PloS one.

[20]  Kazuyuki Murase,et al.  A new local search based hybrid genetic algorithm for feature selection , 2011, Neurocomputing.

[21]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[22]  Sabri Boughorbel,et al.  Model Comparison for Breast Cancer Prognosis Based on Clinical Data , 2016, PloS one.

[23]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[24]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Mutual Information Feature Selection , 2012 .

[25]  Xiaofeng Gu,et al.  An Intelligent System for Lung Cancer Diagnosis Using a New Genetic Algorithm Based Feature Selection Method , 2014, Journal of Medical Systems.

[26]  Lakhmi C. Jain,et al.  Nearest neighbor classifier: Simultaneous editing and feature selection , 1999, Pattern Recognit. Lett..

[27]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognition Letters.

[28]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..