A Novel Hybrid Feature Selection Model for Classification of Neuromuscular Dystrophies Using Bhattacharyya Coefficient, Genetic Algorithm and Radial Basis Function Based Support Vector Machine

An accurate classification of neuromuscular disorders is important in providing proper treatment facilities to the patients. Recently, the microarray technology is employed to monitor the level of activity or expression of large number of genes simultaneously. The gene expression data derived from the microarray experiment usually involve a large number of genes but a very few number of samples. There is a need to reduce the dimension of gene expression data which intends to find a small set of discriminative genes that accurately classifies the samples of various kinds of diseases. So, our goal is to find a small subset of genes which ensures the accurate classification of neuromuscular disorders. In the present paper, we propose a novel hybrid feature selection model for classification of neuromuscular disorders. The process of feature selection is done in two phases by integrating Bhattacharyya coefficient and genetic algorithm (GA). In the first phase, we find Bhattacharyya coefficient to choose a candidate gene subset by removing the most redundant genes. In the second phase, the target gene subset is created by selecting the most discriminative gene subset by applying GA wherein the fitness function is calculated using radial basis function support vector machine (RBF SVM). The proposed hybrid algorithm is applied on two publicly available microarray neuromuscular disorders datasets. The results are compared with two individual techniques of feature selection, namely Bhattacharyya coefficient and GA, and one integrated technique, i.e., Bhattacharyya-GA wherein the fitness function of GA is calculated using four other classifiers, which shows that the proposed integrated method is capable of giving the better classification accuracy.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  Austin H Chen,et al.  Exploring novel algorithms for the prediction of cancer classification , 2010, The 2nd International Conference on Software Engineering and Data Mining.

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  Abdulhamit Subasi,et al.  Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders , 2013, Comput. Biol. Medicine.

[5]  Farid E Ahmed,et al.  Molecular Cancer BioMed Central Review , 2005 .

[6]  Bangpeng Yao,et al.  ANMM4CBR: a case-based reasoning method for gene expression data classification , 2010, Algorithms for Molecular Biology.

[7]  D. Shanthi,et al.  Input Feature Selection using Hybrid Neuro-Genetic Approach in the Diagnosis of Stroke Disease , 2008 .

[8]  Neil A. Thacker,et al.  The Bhattacharyya metric as an absolute similarity measure for frequency coded data , 1998, Kybernetika.

[9]  Javier Bajo,et al.  MicroCBR: A case-based reasoning architecture for the classification of microarray data , 2011, Appl. Soft Comput..

[10]  Mohammad Reza Daliri A hybrid method for the decoding of spatial attention using the MEG brain signals , 2014, Biomed. Signal Process. Control..

[11]  K. Ganapathi Babu,et al.  An Effective Approach in Face Recognition using Image Processing Concepts , 2013 .

[12]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[13]  Tianzi Jiang,et al.  A combinational feature selection and ensemble neural network method for classification of gene expression data , 2004, BMC Bioinformatics.

[14]  Yanwen Chong,et al.  Gene selection using independent variable group analysis for tumor classification , 2011, Neural Computing and Applications.

[15]  Xiuwei Zhang,et al.  Refining transcriptional regulatory networks using network evolutionary models and gene histories , 2010, Algorithms for Molecular Biology.

[16]  Lluís A. Belanche-Muñoz,et al.  Effective Classification and Gene Expression Profiling for the Facioscapulohumeral Muscular Dystrophy , 2013, PloS one.

[17]  Mohammad Saniee Abadeh,et al.  Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function , 2013, Eng. Appl. Artif. Intell..

[18]  Huan Liu,et al.  Feature selection for classification: A review , 2014 .

[19]  Gerald Schaefer,et al.  Data Mining of Gene Expression Data by Fuzzy and Hybrid Fuzzy Methods , 2010, IEEE Transactions on Information Technology in Biomedicine.

[20]  B. Shneiderman,et al.  Nuclear envelope dystrophies show a transcriptional fingerprint suggesting disruption of Rb-MyoD pathways in muscle regeneration. , 2006, Brain : a journal of neurology.

[21]  Mohammad Reza Daliri Feature selection using binary particle swarm optimization and support vector machines for medical diagnosis , 2012, Biomedizinische Technik. Biomedical engineering.

[22]  F. Azuaje,et al.  Gene expression patterns and cancer classification: a self-adaptive and incremental neural approach , 2000, Proceedings 2000 IEEE EMBS International Conference on Information Technology Applications in Biomedicine. ITAB-ITIS 2000. Joint Meeting Third IEEE EMBS International Conference on Information Technol.

[23]  Yi-Zhou Li,et al.  Two multi-classification strategies used on SVM to predict protein structural classes by using auto covariance , 2009, Interdisciplinary Sciences: Computational Life Sciences.

[24]  Alexander Isaev,et al.  PyEvolve: a toolkit for statistical modelling of molecular evolution , 2004, BMC Bioinformatics.

[25]  Hong-Wen Deng,et al.  Gene selection for classification of microarray data based on the Bayes error , 2007, BMC Bioinformatics.

[26]  Chi-Kan Chen,et al.  The classification of cancer stage microarray data , 2012, Comput. Methods Programs Biomed..

[27]  Kuldip K. Paliwal,et al.  Cancer classification by gradient LDA technique using microarray gene expression data , 2008, Data Knowl. Eng..

[28]  Mohammad Reza Daliri,et al.  Predicting the Cognitive States of the Subjects in Functional Magnetic Resonance Imaging Signals Using the Combination of Feature Selection Strategies , 2011, Brain Topography.

[29]  Ilias Maglogiannis,et al.  An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers , 2009, Applied Intelligence.

[30]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[31]  Carsten Peterson,et al.  Analyzing tumor gene expression profiles , 2003, Artif. Intell. Medicine.

[32]  Werner Dubitzky,et al.  Multiclass Cancer Classification Using Gene Expression Profiling and Probabilistic Neural Networks , 2002, Pacific Symposium on Biocomputing.

[33]  Duncan Fyfe Gillies,et al.  A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data , 2015, Adv. Bioinformatics.

[34]  Mohammad Reza Daliri,et al.  A Hybrid Automatic System for the Diagnosis of Lung Cancer Based on Genetic Algorithm and Fuzzy Extreme Learning Machines , 2012, Journal of Medical Systems.

[35]  Mohd Saberi Mohamad,et al.  FEATURE SELECTION METHOD USING GENETIC ALGORITHM FOR THE CLASSIFCATION OF SMALL AND HIGH DIMENSION DATA , 2012 .