A genetic algorithm-support vector machine method with parameter optimization for selecting the tag SNPs

SNPs (Single Nucleotide Polymorphisms) include millions of changes in human genome, and therefore, are promising tools for disease-gene association studies. However, this kind of studies is constrained by the high expense of genotyping millions of SNPs. For this reason, it is required to obtain a suitable subset of SNPs to accurately represent the rest of SNPs. For this purpose, many methods have been developed to select a convenient subset of tag SNPs, but all of them only provide low prediction accuracy. In the present study, a brand new method is developed and introduced as GA-SVM with parameter optimization. This method benefits from support vector machine (SVM) and genetic algorithm (GA) to predict SNPs and to select tag SNPs, respectively. Furthermore, it also uses particle swarm optimization (PSO) algorithm to optimize C and γ parameters of support vector machine. It is experimentally tested on a wide range of datasets, and the obtained results demonstrate that this method can provide better prediction accuracy in identifying tag SNPs compared to other methods at present.

[1]  Russell Schwartz,et al.  Haplotypes and informative SNP selection algorithms: don't block out information , 2003, RECOMB '03.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  R. Altman,et al.  Finding haplotype tagging SNPs by use of principal components analysis. , 2004, American journal of human genetics.

[4]  Sio Iong Ao,et al.  CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPs , 2005, Bioinform..

[5]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[6]  Cheng-Hong Yang,et al.  A Novel Prediction Method for Tag SNP Selection using Genetic Algorithm based on KNN , 2009 .

[7]  Ting Chen,et al.  Haplotype block partition with limited resources and applications to human chromosome 21 haplotype data. , 2003, American journal of human genetics.

[8]  Li-Yeh Chuang,et al.  Improved tag SNP selection using binary particle swarm optimization , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[9]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[10]  Tzung-Pei Hong,et al.  Adapting Crossover and Mutation Rates in Genetic Algorithms , 2003, J. Inf. Sci. Eng..

[11]  E. Boerwinkle,et al.  Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. , 1998, American journal of human genetics.

[12]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[13]  Dana C Crawford,et al.  Definition and clinical importance of haplotypes. , 2005, Annual review of medicine.

[14]  Mehmet Çunkas,et al.  A tool for multiobjective evolutionary algorithms , 2009, Adv. Eng. Softw..

[15]  Zhen Lin,et al.  Choosing Snps Using Feature Selection , 2006, J. Bioinform. Comput. Biol..

[16]  M. Rieder,et al.  Sequence variation in the human angiotensin converting enzyme , 1999, Nature Genetics.

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[19]  Hadar I. Avi-Itzhak,et al.  Selection of Minimum Subsets of Single Nucleotide Polymorphisms to Capture Haplotype Block Diversity , 2003, Pacific Symposium on Biocomputing.

[20]  Eran Halperin,et al.  Tag SNP selection in genotype data for maximizing SNP prediction accuracy , 2005, ISMB.

[21]  Ting Chen,et al.  Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies. , 2004, Genome research.

[22]  Xianbin Cao,et al.  COEVOLUTIONARY OPTIMIZATION ALGORITHM WITH DYNAMIC SUB-POPULATION SIZE , 2007 .

[23]  Michael Krawczak,et al.  Entropy-based SNP selection for genetic association studies , 2003, Human Genetics.

[24]  Khaled S. Ahmed,et al.  Estimating Protein Functions Correlation Based on Overlapping Proteins and Cluster Interactions , 2012 .

[25]  Lon R. Cardon,et al.  Efficient selective screening of haplotype tag SNPs , 2003, Bioinform..

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[27]  Russell Schwartz,et al.  Optimal Haplotype Block-free Selection of Tagging Snps for Genome-wide Association Studies , 2022 .

[28]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[29]  Yan Wang,et al.  SVM Learning from Imbalanced Data by GA Sampling for Protein Domain Prediction , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[30]  D. Nickerson,et al.  Variation is the spice of life , 2001, Nature Genetics.

[31]  Conrad C. Huang,et al.  Sequence diversity and haplotype structure in the human ABCB1 (MDR1, multidrug resistance transporter) gene. , 2003, Pharmacogenetics.

[32]  K. Hao Genome-wide selection of tag SNPs using multiple-marker correlation , 2007 .

[33]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[34]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[35]  Jagath C. Rajapakse,et al.  Machine Learning in Bioinformatics , 2008 .

[36]  Guimei Liu,et al.  FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium , 2010, BMC Bioinformatics.

[37]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[38]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[39]  Jingwu He,et al.  Informative SNP Selection Methods Based on SNP Prediction , 2007, IEEE Transactions on NanoBioscience.

[40]  Adam Prügel-Bennett,et al.  The Mixing Rate of Different Crossover Operators , 2000, FOGA.

[41]  Mark M Iles,et al.  What Can Genome-Wide Association Studies Tell Us about the Genetics of Common Disease , 2008, PLoS genetics.

[42]  Hisao Yamamoto,et al.  Advanced Particle Swarm Optimization Algorithm Computing Plural Acceptable Solutions and Its Application to Wireless Sensor Networks-Forwarding Power Adjustment of Each Sensor Node for Query Dissemination- , 2011 .

[43]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[44]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[45]  Peter J. Angeline,et al.  Evolution Revolution: An Introduction to the Special Track on Genetic and Evolutionary Programming , 1995, IEEE Expert.

[46]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[47]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[48]  Hagit Shatkay,et al.  BNTagger: improved tagging SNP selection using Bayesian networks , 2006, ISMB.

[49]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[50]  Giovanni Felici,et al.  Logic based methods for SNPs tagging and reconstruction , 2010, Comput. Oper. Res..

[51]  Shibu Yooseph,et al.  A Survey of Computational Methods for Determining Haplotypes , 2002, Computational Methods for SNPs and Haplotype Inference.

[52]  Javad Zahiri,et al.  Tag SNP selection via a genetic algorithm , 2010, J. Biomed. Informatics.