A greedier approach for finding tag SNPs

MOTIVATION Recent studies have shown that a small subset of Single Nucleotide Polymorphisms (SNPs) (called tag SNPs) is sufficient to capture the haplotype patterns in a high linkage disequilibrium region. To find the minimum set of tag SNPs, exact algorithms for finding the optimal solution could take exponential time. On the other hand, approximation algorithms are more efficient but may fail to find the optimal solution. RESULTS We propose a hybrid method that combines the ideas of the branch-and-bound method and the greedy algorithm. This method explores larger solution space to obtain a better solution than a traditional greedy algorithm. It also allows the user to adjust the efficiency of the program and quality of solutions. This algorithm has been implemented and tested on a variety of simulated and biological data. The experimental results indicate that our program can find better solutions than previous methods. This approach is quite general since it can be used to adapt other greedy algorithms to solve their corresponding problems. AVAILABILITY The program is available upon request.

[1]  Russell Schwartz,et al.  Haplotypes and informative SNP selection algorithms: don't block out information , 2003, RECOMB '03.

[2]  Ting Chen,et al.  Haplotype block partition with limited resources and applications to human chromosome 21 haplotype data. , 2003, American journal of human genetics.

[3]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[4]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[5]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[6]  L. Helmuth Map of the Human Genome 3.0 , 2001, Science.

[7]  Ryuhei Uehara,et al.  A double classification tree search algorithm for index SNP selection , 2004, BMC Bioinformatics.

[8]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Ting Chen,et al.  Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies. , 2004, Genome research.

[10]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[11]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[12]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[13]  L. Helmuth Genome research: map of the human genome 3.0. , 2001, Science.

[14]  Wenzhong Zhao,et al.  Efficient RNAi-based gene family knockdown via set cover optimization , 2005, Artif. Intell. Medicine.

[15]  Eran Halperin,et al.  Tag SNP selection in genotype data for maximizing SNP prediction accuracy , 2005, ISMB.

[16]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[17]  Ting Chen,et al.  Selecting additional tag SNPs for tolerating missing data in genotyping , 2005, BMC Bioinformatics.

[18]  Jingwu He,et al.  Linear reduction methods for tag SNP selection , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[19]  Dana C Crawford,et al.  Definition and clinical importance of haplotypes. , 2005, Annual review of medicine.

[20]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[21]  Alex Zelikovsky,et al.  Linear reduction method for predictive and informative tag SNP selection , 2005, Int. J. Bioinform. Res. Appl..