Dynamic programming algorithms for haplotype block partitioning: applications to human chromosome 21 haplotype data

Recent studies have shown that the human genome has a haplotype block structure such that it can be divided into discrete blocks of limited haplotype diversity. Patil et al. [6] and Zhang et al. [12] developed algorithms to partition haplotypes into blocks with minimum number of tag SNPs for the entire chromosome. However, it is not clear how to partition haplotypes into blocks with restricted number of SNPs when only limited resources are available. In this paper, we first formulated this problem as finding a block partition with a fixed number of tag SNPs that can cover the maximal percentage of a genome. Then we solved it by two dynamic programming algorithms, which are fairly flexible to take into account the knowledge of functional polymorphism. We applied our algorithms to the published SNP data of human chromosome 21 combining with the functional information of these SNPs and demonstrated the effectiveness of them. Statistical investigation of the relationship between the starting points of a block partition and the coding and non-coding regions illuminated that the SNPs at these starting points are not significantly enriched in coding regions. We also developed an efficient algorithm to find all possible long local maximal haplotypes across a subset of samples. After applying this algorithm to the human chromosome 21 haplotype data, we found that samples with long local haplotypes are not necessarily globally similar.

[1]  C. Nusbaum,et al.  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. , 1998, Science.

[2]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[3]  Dan Gusfield,et al.  Parametric optimization of sequence alignment , 1992, SODA '92.

[4]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[5]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[6]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[7]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Lon R. Cardon,et al.  A first-generation linkage disequilibrium map of human chromosome 22 , 2002, Nature.

[9]  Fengzhu Sun,et al.  Haplotype block structure and its applications to association studies: power and study designs. , 2002, American journal of human genetics.

[10]  Susan R. Wilson INTRODUCTION TO COMPUTATIONAL BIOLOGY: MAPS, SEQUENCES AND GENOMES. , 1996 .

[11]  E. Lander,et al.  Parametric sequence comparisons. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[12]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.