The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes

BackgroundSince the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information.FindingsAs part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics’ Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics’ standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data.ConclusionsThese genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.

[1]  David Levine,et al.  A high-performance computing toolset for relatedness and principal component analysis of SNP data , 2012, Bioinform..

[2]  Pieter B. T. Neerincx,et al.  Supplementary Information Whole-genome sequence variation , population structure and demographic history of the Dutch population , 2022 .

[3]  Jessica C. Ebert,et al.  Accurate whole genome sequencing and haplotyping from10-20 human cells , 2012, Nature.

[4]  Erika Check Hayden,et al.  Technology: The $1,000 genome , 2014, Nature.

[5]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[6]  Radoje Drmanac,et al.  Co-barcoded sequence reads from long DNA fragments: a cost-effective solution for “perfect genome” sequencing , 2015, Front. Genet..

[7]  Jessica C. Ebert,et al.  Computational Techniques for Human Genome Resequencing Using Mated Gapped Reads , 2012, J. Comput. Biol..

[8]  Tom R. Gaunt,et al.  The UK10K project identifies rare variants in health and disease , 2016 .

[9]  S. Scherer,et al.  Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. , 2013, American journal of human genetics.

[10]  Levi C. T. Pierce,et al.  Deep sequencing of 10,000 human genomes , 2016, Proceedings of the National Academy of Sciences.

[11]  Robert B. Hartlage,et al.  This PDF file includes: Materials and Methods , 2009 .

[12]  Rebecca Yu Zhang,et al.  Detection and phasing of single base de novo mutations in biopsies from human in vitro fertilized embryos by advanced whole-genome sequencing , 2015, Genome research.

[13]  Morris Swertz,et al.  Genome-wide patterns and properties of de novo mutations in humans , 2015, Nature Genetics.

[14]  Clifford A Reid,et al.  Complete Genomics Inc. , 2011 .

[15]  S. Kingsmore,et al.  Comprehensive human genome amplification using multiple displacement amplification , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Bryony Jones,et al.  Genomics: Personal genome project , 2012, Nature Reviews Genetics.

[17]  J. Vockley,et al.  New observations on maternal age effect on germline de novo mutations , 2016, Nature Communications.