Evaluation of sample size effect on the identification of haplotype blocks

BackgroundGenome-wide maps of linkage disequilibrium (LD) and haplotypes have been created for different populations. Substantial sharing of the boundaries and haplotypes among populations was observed, but haplotype variations have also been reported across populations. Conflicting observations on the extent and distribution of haplotypes require careful examination. The mechanisms that shape haplotypes have not been fully explored, although the effect of sample size has been implicated. We present a close examination of the effect of sample size on haplotype blocks using an original computational simulation.ResultsA region spanning 19.31 Mb on chromosome 20q was genotyped for 1,147 SNPs in 725 Japanese subjects. One region of 445 kb exhibiting a single strong LD value (average |D'|; 0.94) was selected for the analysis of sample size effect on haplotype structure. Three different block definitions (recombination-based, LD-based, and diversity-based) were exploited to create simulations for block identification with θ value from real genotyping data. As a result, it was quite difficult to estimate a haplotype block for data with less than 200 samples. Attainment of a reliable haplotype structure with 50 samples was not possible, although the simulation was repeated 10,000 times.ConclusionThese analyses underscored the difficulties of estimating haplotype blocks. To acquire a reliable result, it would be necessary to increase sample size more than 725 and to repeat the simulation 3,000 times. Even in one genomic region showing a high LD value, the haplotype block might be fragile. We emphasize the importance of applying careful confidence measures when using the estimated haplotype structure in biomedical research.

[1]  D. Easton,et al.  Sampling distribution of summary linkage disequilibrium measures , 2002 .

[2]  P. Deloukas,et al.  The impact of SNP density on fine-scale patterns of linkage disequilibrium. , 2004, Human molecular genetics.

[3]  Naoto Nakamura,et al.  Association study on chromosome 20q11.21-13.13 locus and its contribution to type 2 diabetes susceptibility in Japanese , 2006, Human Genetics.

[4]  Yusuke Nakamura,et al.  Large-scale single-nucleotide polymorphism (SNP) and haplotype analyses, using dense SNP Maps, of 199 drug-related genes in 752 subjects: the analysis of the association between uncommon SNPs within haplotype blocks and the haplotypes constructed with haplotype-tagging SNPs. , 2004, American journal of human genetics.

[5]  Russell Schwartz,et al.  Robustness of Inference of Haplotype Block Structure , 2003, J. Comput. Biol..

[6]  Michael Nothnagel,et al.  The effect of single-nucleotide polymorphism marker selection on patterns of haplotype blocks and haplotype frequency estimates. , 2005, American journal of human genetics.

[7]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[8]  R. Lewontin The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. , 1964, Genetics.

[9]  Hongyu Zhao,et al.  The impact of sample size and marker selection on the study of haplotype structures , 2004, Human Genomics.

[10]  W. G. Hill,et al.  Linkage disequilibrium in finite populations , 1968, Theoretical and Applied Genetics.

[11]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[12]  浜田 大輔 Association between single-nucleotide polymorphisms in the SEC8L1 gene, which encodes a subunit of the exocyst complex, and rheumatoid arthritis in a Japanese population , 2005 .

[13]  N. Schork,et al.  Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. , 2000, American journal of human genetics.

[14]  Naoto Nakamura,et al.  Association of single-nucleotide polymorphisms in the suppressor of cytokine signaling 2 (SOCS2) gene with type 2 diabetes in the Japanese. , 2006, Genomics.

[15]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[16]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[17]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[18]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[19]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[20]  Peter J. Tonellato,et al.  Analysis of concordance of different haplotype block partitioning algorithms , 2005, BMC Bioinformatics.

[21]  J. Akey,et al.  Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. , 2002, American journal of human genetics.

[22]  B. J. Carey,et al.  Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots , 2003, Nature Genetics.

[23]  Jing Zhang,et al.  The effect of haplotype-block definitions on inference of haplotype-block structure and htSNPs selection. , 2005, Molecular biology and evolution.