A novel gene selection method using GA/SVM and fisher criteria in Alzheimer's disease

Identification of those genes which cause diseases can develop the process of diagnosis and the treatment of diseases. In this paper, a gene selection method based on genetic algorithm (GA) and support vector machines (SVM) is presented. At first, Fisher criteria is utilized in order to do filtration for those genes which are noisy and redundant in high dimensional microarray data. Then, GA/SVM model is used for selection of various subsets of maximally informative genes with the use of different training sets. The frequency of appearance of each gene in various subsets of genes is analyzed. Therefore, the last subset contains those genes which are highly informative. In fact, Fisher and GA/SVM methods have been merged in order to take benefit from a filtering method as well as an embedded method. The proposed method is tested on DNA microarray gene expression data of Alzheimer's disease. The results show that the proposed method has a good selection and classification performance, which can yield 100% classification accuracy using only 15 genes. From biological point of view, at least 8 (53%) of these genes are Alzheimer associated genes. Thus, these genes not only can serve as predictors of the disease, but also can use as a means to find new candidate genes.

[1]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[2]  J. Ramírez,et al.  SVM-based computer-aided diagnosis of the Alzheimer's disease using t-test NMSE feature selection with feature correlation weighting , 2009, Neuroscience Letters.

[3]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[4]  M. Memo,et al.  Increased CD44 Gene Expression in Lymphocytes Derived from Alzheimer Disease Patients , 2010, Neurodegenerative Diseases.

[5]  Mohd Saberi Mohamad,et al.  FEATURE SELECTION METHOD USING GENETIC ALGORITHM FOR THE CLASSIFCATION OF SMALL AND HIGH DIMENSION DATA , 2012 .

[6]  Shutao Li,et al.  Gene selection using genetic algorithm and support vectors machines , 2008, Soft Comput..

[7]  Azadeh Mohammadi,et al.  Identification of disease-causing genes using microarray data mining and Gene Ontology , 2011, BMC Medical Genomics.

[8]  Mitja Lustrek,et al.  Tissue-based Alzheimer gene expression markers–comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets , 2012, BMC Bioinformatics.

[9]  Elizabeth Tapia,et al.  Sparse and stable gene selection with consensus SVM-RFE , 2012, Pattern Recognit. Lett..

[10]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[11]  Carlo Zaniolo,et al.  Analysing microarray expression data through effective clustering , 2014, Inf. Sci..

[12]  Xiaohua Hu,et al.  Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data , 2011, BMC Bioinformatics.

[13]  M. Kivipelto,et al.  Lymphocytic mitochondrial aconitase activity is reduced in Alzheimer's disease and mild cognitive impairment. , 2015, Journal of Alzheimer's disease : JAD.

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  M. Guerreiro,et al.  Genetic and biochemical markers in patients with Alzheimer's disease support a concerted systemic iron homeostasis dysregulation , 2014, Neurobiology of Aging.