Analysis of concordance of different haplotype block partitioning algorithms

BackgroundDifferent classes of haplotype block algorithms exist and the ideal dataset to assess their performance would be to comprehensively re-sequence a large genomic region in a large population. Such data sets are expensive to collect. Alternatively, we performed coalescent simulations to generate haplotypes with a high marker density and compared block partitioning results from diversity based, LD based, and information theoretic algorithms under different values of SNP density and allele frequency.ResultsWe simulated 1000 haplotypes using the standard coalescent for three world populations – European, African American, and East Asian – and applied three classes of block partitioning algorithms – diversity based, LD based, and information theoretic. We assessed algorithm differences in number, size, and coverage of blocks inferred under different conditions of SNP density, allele frequency, and sample size.Each algorithm inferred blocks differing in number, size, and coverage under different density and allele frequency conditions. Different partitions had few if any matching block boundaries. However they still overlapped and a high percentage of total chromosomal region was common to all methods. This percentage was generally higher with a higher density of SNPs and when rarer markers were included.ConclusionA gold standard definition of a haplotype block is difficult to achieve, but collecting haplotypes covered with a high density of SNPs, partitioning them with a variety of block algorithms, and identifying regions common to all methods may be the best way to identify genomic regions that harbor SNP variants that cause disease.

[1]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[2]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[3]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[4]  Deborah A. Nickerson,et al.  Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans , 2003, Nature Genetics.

[5]  Kui Zhang,et al.  Defining haplotype blocks and tag single-nucleotide polymorphisms in the human genome. , 2004, Human molecular genetics.

[6]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[7]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[8]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[9]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[11]  Russell Schwartz,et al.  Haplotypes and informative SNP selection algorithms: don't block out information , 2003, RECOMB '03.

[12]  Russell Schwartz,et al.  Robustness of Inference of Haplotype Block Structure , 2003, J. Comput. Biol..

[13]  J. Novembre,et al.  Finding haplotype block boundaries by using the minimum-description-length principle. , 2003, American journal of human genetics.

[14]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[15]  Russell Schwartz,et al.  Optimal Haplotype Block-free Selection of Tagging Snps for Genome-wide Association Studies , 2022 .

[16]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[17]  B. Weir Genetic Data Analysis II. , 1997 .

[18]  J. Wall,et al.  Assessing the performance of the haplotype block model of linkage disequilibrium. , 2003, American journal of human genetics.

[19]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[20]  Jing Zhang,et al.  The effect of haplotype-block definitions on inference of haplotype-block structure and htSNPs selection. , 2005, Molecular biology and evolution.

[21]  B. J. Carey,et al.  Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots , 2003, Nature Genetics.

[22]  P. Deloukas,et al.  The impact of SNP density on fine-scale patterns of linkage disequilibrium. , 2004, Human molecular genetics.

[23]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[24]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[25]  Pui-Yan Kwok,et al.  Sequence variations in the public human genome data reflect a bottlenecked population history , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[26]  D. Cox,et al.  Complex high-resolution linkage disequilibrium and haplotype patterns of single-nucleotide polymorphisms in 2.5 Mb of sequence on human chromosome 21. , 2001, Genomics.

[27]  Simon Tavaré,et al.  Linkage disequilibrium: what history has to tell us. , 2002, Trends in genetics : TIG.

[28]  Kui Zhang,et al.  Hapblock: Haplotype Block Partitioning and Tag Snp Selection Software Using a Set of Dynamic Programming Algorithms , 2022 .

[29]  Andrew G Clark,et al.  Linkage disequilibrium and the mapping of complex human traits. , 2002, Trends in genetics : TIG.

[30]  J. Akey,et al.  Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. , 2002, American journal of human genetics.