The Clark Phase-able Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS

A phase transition is taking place today. The amount of data generated by genome resequencing technologies is so large that in some cases it is now less expensive to repeat the experiment than to store the information generated by the experiment. In the next few years, it is quite possible that millions of Americans will have been genotyped. The question then arises of how to make the best use of this information and jointly estimate the haplotypes of all these individuals. The premise of this article is that long shared genomic regions (or tracts) are unlikely unless the haplotypes are identical by descent. These tracts can be used as input for a Clark-like phasing method to obtain a phasing solution of the sample. We show on simulated data that the algorithm will get an almost perfect solution if the number of individuals being genotyped is large enough and the correctness of the algorithm grows with the number of individuals being genotyped. We also study a related problem that connects copy number variation with phasing algorithm success. A loss of heterozygosity (LOH) event is when, by the laws of Mendelian inheritance, an individual should be heterozygote but, due to a deletion polymorphism, is not. Such polymorphisms are difficult to detect using existing algorithms, but play an important role in the genetics of disease and will confuse haplotype phasing algorithms if not accounted for. We will present an algorithm for detecting LOH regions across the genomes of thousands of individuals. The design of the long-range phasing algorithm and the loss of heterozygosity inference algorithms was inspired by our analysis of the Multiple Sclerosis (MS) GWAS dataset of the International Multiple Sclerosis Genetics Consortium. We present similar results to those obtained from the MS data.

[1]  D. Conrad,et al.  A high-resolution survey of deletion polymorphism in the human genome , 2006, Nature Genetics.

[2]  M. Daly,et al.  Genetic Mapping in Human Disease , 2008, Science.

[3]  Shibu Yooseph,et al.  A Survey of Computational Methods for Determining Haplotypes , 2002, Computational Methods for SNPs and Haplotype Inference.

[4]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[5]  Sorin Istrail The Haplotype Phasing Problem , 2002 .

[6]  Yurii S. Aulchenko,et al.  Twenty bone mineral density loci identified by large-scale meta-analysis of genome-wide association studies , 2009, Nature Genetics.

[7]  Pardis C Sabeti,et al.  Common deletion polymorphisms in the human genome , 2006, Nature Genetics.

[8]  Thomas W. Mühleisen,et al.  Large recurrent microdeletions associated with schizophrenia , 2008, Nature.

[9]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[10]  R. Durbin,et al.  Mapping trait loci by use of inferred ancestral recombination graphs. , 2006, American journal of human genetics.

[11]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[12]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[13]  Eleazar Eskin,et al.  Identification of Deletion Polymorphisms from Haplotypes , 2007, RECOMB.

[14]  N. Siva 1000 Genomes project , 2008, Nature Biotechnology.

[15]  John P A Ioannidis,et al.  Meta-analysis in genome-wide association studies. , 2009, Pharmacogenomics.

[16]  Kari Stefansson,et al.  Multiple genetic loci for bone mineral density and fractures. , 2008, The New England journal of medicine.

[17]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[18]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[19]  Bjarni V. Halldórsson,et al.  Many sequence variants affecting diversity of adult human height , 2008, Nature Genetics.

[20]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[21]  John P. A. Ioannidis,et al.  Twenty loci associated with bone mineral density identified by large-scale meta-analysis of genome-wide association datasets , 2009 .

[22]  Roded Sharan,et al.  Islands of Tractability for Parsimony Haplotyping , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[24]  A. Verma,et al.  Risk Alleles for Multiple Sclerosis Identified by a Genomewide Study , 2008 .

[25]  Pall I. Olason,et al.  Detection of sharing by descent, long-range phasing and haplotype imputation , 2008, Nature Genetics.