Evolutionary algorithms for the selection of single nucleotide polymorphisms

BackgroundLarge databases of single nucleotide polymorphisms (SNPs) are available for use in genomics studies. Typically, investigators must choose a subset of SNPs from these databases to employ in their studies. The choice of subset is influenced by many factors, including estimated or known reliability of the SNP, biochemical factors, intellectual property, cost, and effectiveness of the subset for mapping genes or identifying disease loci. We present an evolutionary algorithm for multiobjective SNP selection.ResultsWe implemented a modified version of the Strength-Pareto Evolutionary Algorithm (SPEA2) in Java. Our implementation, Multiobjective Analyzer for Genetic Marker Acquisition (MAGMA), approximates the set of optimal trade-off solutions for large problems in minutes. This set is very useful for the design of large studies, including those oriented towards disease identification, genetic mapping, population studies, and haplotype-block elucidation.ConclusionEvolutionary algorithms are particularly suited for optimization problems that involve multiple objectives and a complex search space on which exact methods such as exhaustive enumeration cannot be applied. They provide flexibility with respect to the problem formulation if a problem description evolves or changes. Results are produced as a trade-off front, allowing the user to make informed decisions when prioritizing factors. MAGMA is open source and available at http://snp-magma.sourceforge.net. Evolutionary algorithms are well suited for many other applications in genomics.

[1]  Vladimir Brusic,et al.  Prediction of MHC class II-binding peptides using an evolutionary algorithm and artificial neural network , 1998, Bioinform..

[2]  Marco Laumanns,et al.  SPEA2: Improving the Strength Pareto Evolutionary Algorithm For Multiobjective Optimization , 2002 .

[3]  Tomoyuki Hiroyasu,et al.  SPEA2+: Improving the Performance of the Strength Pareto Evolutionary Algorithm 2 , 2004, PPSN.

[4]  Yan P. Yuan,et al.  HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources , 2002, Nucleic Acids Res..

[5]  Peter J. Fleming,et al.  An Overview of Evolutionary Algorithms in Multiobjective Optimization , 1995, Evolutionary Computation.

[6]  Eckart Zitzler,et al.  Evolutionary algorithms for multiobjective optimization: methods and applications , 1999 .

[7]  Kalyanmoy Deb,et al.  Multi-objective Genetic Algorithms: Problem Difficulties and Construction of Test Problems , 1999, Evolutionary Computation.

[8]  Marco Laumanns,et al.  On the Effects of Archiving, Elitism, and Density Based Selection in Evolutionary Multi-objective Optimization , 2001, EMO.

[9]  Christopher A. Haiman,et al.  Choosing Haplotype-Tagging SNPS Based on Unphased Genotype Data Using a Preliminary Sample of Unrelated Subjects with an Example from the Multiethnic Cohort Study , 2003, Human Heredity.

[10]  Z. Meng,et al.  Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. , 2003, American journal of human genetics.

[11]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  A. Jeffreys,et al.  High resolution analysis of haplotype diversity and meiotic crossover in the human TAP2 recombination hotspot. , 2000, Human molecular genetics.

[13]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[14]  Lothar Thiele,et al.  Proceedings of the 2nd international conference on Evolutionary multi-criterion optimization , 2003 .

[15]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[16]  Francisco M De La Vega,et al.  New generation pharmacogenomic tools: a SNP linkage disequilibrium Map, validated SNP assay resource, and high-throughput instrumentation system for large-scale genetic studies. , 2002, BioTechniques.

[17]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[18]  Zbigniew Michalewicz,et al.  Handbook of Evolutionary Computation , 1997 .

[19]  Douglas R Storts,et al.  The challenge of using SNPs in the understanding and treatment of disease. , 2002, BioTechniques.

[20]  L. Wu,et al.  An Automated Computer System to Support Ultra High Throughput SNP Genotyping , 2001, Pacific Symposium on Biocomputing.

[21]  Deborah A. Nickerson,et al.  SNPing in the human genome , 2004, RECOMB.

[22]  Michael P Weiner,et al.  Introduction to SNPs: discovery of markers for disease. , 2002, BioTechniques.

[23]  Robert Hubley,et al.  Multiobjective Genetic Marker Selection , 2002 .

[24]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.