Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing

Rapid advances in high-throughput sequencing facilitate variant discovery and genotyping, but linking variants into a single haplotype remains challenging. Here we demonstrate HaploSeq, an approach for assembling chromosome-scale haplotypes by exploiting the existence of 'chromosome territories'. We use proximity ligation and sequencing to show that alleles on homologous chromosomes occupy distinct territories, and therefore this experimental protocol preferentially recovers physically linked DNA variants on a homolog. Computational analysis of such data sets allows for accurate (∼99.5%) reconstruction of chromosome-spanning haplotypes for ∼95% of alleles in hybrid mouse cells with 30× sequencing coverage. To resolve haplotypes for a human genome, which has a low density of variants, we coupled HaploSeq with local conditional phasing to obtain haplotypes for ∼81% of alleles with ∼98% accuracy from just 17× sequencing. Whereas methods based on proximity ligation were originally designed to investigate spatial organization of genomes, our results lend support for their use as a general tool for haplotyping.

[1]  Kui Zhang,et al.  Direct determination of molecular haplotypes by chromosome microdissection , 2010, Nature Methods.

[2]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[3]  Katja Nowick,et al.  A comprehensively molecular haplotype-resolved genome of a European individual. , 2011, Genome research.

[4]  John N. Hutchinson,et al.  Widespread Monoallelic Expression on Human Autosomes , 2007, Science.

[5]  Jessica C. Ebert,et al.  Accurate whole genome sequencing and haplotyping from10-20 human cells , 2012, Nature.

[6]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[7]  Dmitry Pushkarev,et al.  Single-molecule sequencing of an individual human genome , 2009, Nature Biotechnology.

[8]  J. Zschocke Dominant versus recessive: Molecular mechanisms in metabolic disease , 2008, Journal of Inherited Metabolic Disease.

[9]  Adrian W. Briggs,et al.  A High-Coverage Genome Sequence from an Archaic Denisovan Individual , 2012, Science.

[10]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[11]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[12]  Andrew C. Adey,et al.  Haplotype-resolved genome sequencing of a Gujarati Indian individual , 2011, Nature Biotechnology.

[13]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[14]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[15]  D. Zeng,et al.  Estimating haplotype‐disease associations with pooled genotype data , 2005, Genetic epidemiology.

[16]  B. Browning,et al.  Improving the Accuracy and Efficiency of Identity-by-Descent Detection in Population Data , 2013, Genetics.

[17]  B. Ren,et al.  Base-Resolution Analyses of Sequence and Parent-of-Origin Dependent DNA Methylation in the Mouse Genome , 2012, Cell.

[18]  Cameron S. Osborne,et al.  Pairing of Homologous Regions in the Mouse Genome Is Associated with Transcription but Not Imprinting Status , 2012, PloS one.

[19]  Mostafa Ronaghi,et al.  Whole-genome haplotyping by dilution, amplification, and sequencing , 2013, Proceedings of the National Academy of Sciences.

[20]  Marta E Alarcón-Riquelme,et al.  Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci , 2008, Nature Genetics.

[21]  D. Goldstein,et al.  Uncovering the roles of rare variants in common disease through whole-genome sequencing , 2010, Nature Reviews Genetics.

[22]  J. Dekker,et al.  Capturing Chromosome Conformation , 2002, Science.

[23]  Effie W Petersdorf,et al.  MHC Haplotype Matching for Unrelated Hematopoietic Cell Transplantation , 2007, PLoS medicine.

[24]  Stephen R Quake,et al.  Whole-genome molecular haplotyping of single cells , 2011, Nature Biotechnology.

[25]  Pui-Yan Kwok,et al.  Multiple polymorphisms in the TNFAIP3 region are independently associated with systemic lupus erythematosus , 2008, Nature Genetics.

[26]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[27]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[28]  Rudolf Jaenisch,et al.  Asynchronous replication timing of imprinted loci is independent of DNA methylation, but consistent with differential subnuclear localization. , 2003, Genes & development.

[29]  Marc Via i García An integrated map of genetic variation from 1,092 human genomes , 2012 .

[30]  Xia Yang,et al.  Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. , 2013, American journal of human genetics.

[31]  Reza Kalhor,et al.  Genome architectures revealed by tethered chromosome conformation capture and population-based modeling , 2011, Nature Biotechnology.

[32]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[33]  P. Donnelly,et al.  Replicating genotype–phenotype associations , 2007, Nature.

[34]  Dana C Crawford,et al.  Definition and clinical importance of haplotypes. , 2005, Annual review of medicine.

[35]  V. Bansal,et al.  The importance of phase information for human genomics , 2011, Nature Reviews Genetics.

[36]  Jay Shendure,et al.  Noninvasive Whole-Genome Sequencing of a Human Fetus , 2012, Science Translational Medicine.

[37]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[38]  K. Verstrepen,et al.  Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques , 2011, Nucleic acids research.

[39]  Vineet Bafna,et al.  HapCUT: an efficient and accurate algorithm for the haplotype assembly problem , 2008, ECCB.

[40]  E. Birney,et al.  Heritable Individual-Specific and Allele-Specific Chromatin Signatures in Humans , 2010, Science.

[41]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[42]  Daniel F. Gudbjartsson,et al.  Parental origin of sequence variants associated with complex diseases , 2009, Nature.

[43]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[44]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[45]  Wing Hung Wong,et al.  Completely phased genome sequencing through chromosome sorting , 2010, Proceedings of the National Academy of Sciences.

[46]  Christian R Marshall,et al.  Sequencing of isolated sperm cells for direct haplotyping of a human genome , 2013, Genome research.