AbstractBackgroundIn population-based studies, it is generally recognized that single nucleotide polymorphism (SNP) markers are not independent. Rather, they are carried by haplotypes, groups of SNPs that tend to be coinherited. It is thus possible to choose a much smaller number of SNPs to use as indices for identifying haplotypes or haplotype blocks in genetic association studies. We refer to these characteristic SNPs as index SNPs. In order to reduce costs and work, a minimum number of index SNPs that can distinguish all SNP and haplotype patterns should be chosen. Unfortunately, this is an NP-complete problem, requiring brute force algorithms that are not feasible for large data sets.ResultsWe have developed a double classification tree search algorithm to generate index SNPs that can distinguish all SNP and haplotype patterns. This algorithm runs very rapidly and generates very good, though not necessarily minimum, sets of index SNPs, as is to be expected for such NP-complete problems.ConclusionsA new algorithm for index SNP selection has been developed. A webserver for index SNP selection is available at
http://cognia.cu-genome.org/cgi-bin/genome/snpIndex.cgi/
[1]
D. Clayton,et al.
Choosing a set of haplotype tagging SNPs from a larger set of diallelic loci
,
2001
.
[2]
Paola Sebastiani,et al.
Minimal haplotype tagging
,
2003,
Proceedings of the National Academy of Sciences of the United States of America.
[3]
David S. Johnson,et al.
Computers and Intractability: A Guide to the Theory of NP-Completeness
,
1978
.
[4]
Frank Dudbridge,et al.
Haplotype tagging for the identification of common disease genes
,
2001,
Nature Genetics.
[5]
Peisen Zhang,et al.
An algorithm based on graph theory for the assembly of contigs in physical mapping of DNA
,
1994,
Comput. Appl. Biosci..
[6]
Jinghui Zhang,et al.
HapScope: a software system for automated and visual analysis of functionally annotated haplotypes.
,
2002,
Nucleic acids research.
[7]
Russell Schwartz,et al.
Haplotypes and informative SNP selection algorithms: don't block out information
,
2003,
RECOMB '03.