Multi-objective tag SNPs selection using evolutionary algorithms

MOTIVATION Integrated analysis of single nucleotide polymorphisms (SNPs) and structure variations showed that the extent of linkage disequilibrium is common across different types of genetic variants. A subset of SNPs (called tag SNPs) is sufficient for capturing alleles of bi-allelic and even multi-allelic variants. However, accuracy and power of tag SNPs are affected by several factors, including genotyping failure, errors and tagging bias of certain alleles. In addition, different sets of tag SNPs should be selected for fulfilling requirements of various genotyping platforms and projects. RESULTS This study formulates the problem of selecting tag SNPs into a four-objective optimization problem that minimizes the total amount of tag SNPs, maximizes tolerance for missing data, enlarges and balances detection power of each allele class. To resolve this problem, we propose evolutionary algorithms incorporated with greedy initialization to find non-dominated solutions considering all objectives simultaneously. This method provides users with great flexibility to extract different sets of tag SNPs for different platforms and scenarios (e.g. up to 100 tags and 10% missing rate). Compared to conventional methods, our method explores larger search space and requires shorter convergence time. Experimental results revealed strong and weak conflicts among these objectives. In particular, a small number of additional tag SNPs can provide sufficient tolerance and balanced power given the low missing and error rates of today's genotyping platforms. AVAILABILITY The software is freely available at Bioinformatics online and http://cilab.cs.ccu.edu.tw/service_dl.html.

[1]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[2]  Peter J. Fleming,et al.  Evolutionary many-objective optimisation: an exploratory analysis , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[3]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[4]  Lothar Thiele,et al.  Proceedings of the 2nd international conference on Evolutionary multi-criterion optimization , 2003 .

[5]  Tao Liu,et al.  An unusual haplotype structure on human chromosome 8p23 derived from the inversion polymorphism , 2008, Human mutation.

[6]  G. Chase,et al.  The Impact of Missing and Erroneous Genotypes on Tagging SNP Selection and Power of Subsequent Association Tests , 2006, Human Heredity.

[7]  Nicola Beume,et al.  Pareto-, Aggregation-, and Indicator-Based Methods in Many-Objective Optimization , 2007, EMO.

[8]  Evan J. Hughes,et al.  MSOPS-II: A general-purpose Many-Objective optimiser , 2007, 2007 IEEE Congress on Evolutionary Computation.

[9]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[10]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[11]  Ting Chen,et al.  Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies. , 2004, Genome research.

[12]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[13]  Hisao Ishibuchi,et al.  Effectiveness of scalability improvement attempts on the performance of NSGA-II for many-objective problems , 2008, GECCO '08.

[14]  J. David Schaffer,et al.  Proceedings of the third international conference on Genetic algorithms , 1989 .

[15]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[16]  Eran Halperin,et al.  Haplotype reconstruction from genotype data using Imperfect Phylogeny , 2004, Bioinform..

[17]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[18]  Hisao Ishibuchi,et al.  Evolutionary many-objective optimization: A short review , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[19]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[20]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[21]  Pak Chung Sham,et al.  GENECOUNTING: haplotype analysis with missing genotypes , 2002, Bioinform..

[22]  Jouni Lampinen,et al.  Ranking-Dominance and Many-Objective Optimization , 2007, 2007 IEEE Congress on Evolutionary Computation.

[23]  Kun-Mao Chao,et al.  A greedier approach for finding tag SNPs , 2006, Bioinform..

[24]  Ting Chen,et al.  BMC Bioinformatics Methodology article Selecting additional tag SNPs for tolerating missing data in genotyping , 2005 .

[25]  Evan J. Hughes,et al.  Evolutionary many-objective optimisation: many once or one many? , 2005, 2005 IEEE Congress on Evolutionary Computation.

[26]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[27]  E. Hughes Multiple single objective Pareto sampling , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[28]  M. Olivier A haplotype map of the human genome , 2003, Nature.