Identifying genetic biomarkers associated to Alzheimer's disease using Support Vector Machine

Machine learning methods are used to identify genetic biomarkers associated to complex diseases. Alzheimer's disease (AD) is a degenerative disorder that attacks the brain's neurons. Single Nucleotide Polymorphisms (SNPs) are the most common type of human genetic variation. SNPs are useful markers for disease genes. SNPs related to many common and serious diseases like AD. Discovering SNP biomarkers associated with AD can contribute to early prediction and diagnosis of this disease. Feature selection methods namely Correlation-based feature selection (CFS) and Chi-squared feature selection were used to find the most important SNPs. Support Vector Machine (SVM) classifier of different kernels has been applied on Alzheimer's Disease Neuroimaging Initiative Phase 1 (ADNI-1) data based on selected 21 variants most associated with AD. Results revealed that SVM trained model using RBF kernel has a high association with AD and achieves better accuracy of 76.70%.

[1]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[2]  Tao Qin,et al.  Feature selection for ranking , 2007, SIGIR.

[3]  T. Salakoski,et al.  Regularized Machine Learning in the Genetic Prediction of Complex Traits , 2014, PLoS genetics.

[4]  Qingzhong Liu,et al.  Critical Dimension in Data Mining , 2012 .

[5]  George N. Papadimitriou,et al.  Genetics of Late-Onset Alzheimer's Disease: Update from the Alzgene Database and Analysis of Shared Pathways , 2011, International journal of Alzheimer's disease.

[6]  R. Tanzi,et al.  The Genetics of Alzheimer Disease: Back to the Future , 2010, Neuron.

[7]  Jason H. Moore,et al.  Alzheimer's Disease Neuroimaging Initiative biomarkers as quantitative phenotypes: Genetics core aims, progress, and plans , 2010, Alzheimer's & Dementia.

[8]  Nourhan Zayed,et al.  Discovering Alzheimer Genetic Biomarkers Using Bayesian Networks , 2015, Adv. Bioinformatics.

[9]  G. Aranda-Abreu,et al.  Decision trees for the analysis of genes involved in Alzheimer's disease pathology. , 2014, Journal of theoretical biology.

[10]  Yi Su,et al.  Heterogeneous multimodal biomarkers analysis for Alzheimer’s disease via Bayesian network , 2016, EURASIP J. Bioinform. Syst. Biol..

[11]  D. Blacker,et al.  Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database , 2007, Nature Genetics.

[12]  N. Ellouze,et al.  Evaluation of SVM Kernels and Conventional Machine Learning Algorithms for Speaker Identification , 2010 .

[13]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[14]  Dr. Nevine Makram Labib,et al.  A Proposed Data Mining Model for the Associated Factors of Alzheimer ’ s Disease , .