Accurate and comprehensive sequencing of personal genomes.

As whole-genome sequencing becomes commoditized and we begin to sequence and analyze personal genomes for clinical and diagnostic purposes, it is necessary to understand what constitutes a complete sequencing experiment for determining genotypes and detecting single-nucleotide variants. Here, we show that the current recommendation of ∼30× coverage is not adequate to produce genotype calls across a large fraction of the genome with acceptably low error rates. Our results are based on analyses of a clinical sample sequenced on two related Illumina platforms, GAII(x) and HiSeq 2000, to a very high depth (126×). We used these data to establish genotype-calling filters that dramatically increase accuracy. We also empirically determined how the callable portion of the genome varies as a function of the amount of sequence data used. These results help provide a "sequencing guide" for future whole-genome sequencing decisions and metrics by which coverage statistics should be reported.

[1]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[2]  G. Sermonti The human genome. , 1988, Rivista di biologia.

[3]  G Bernardi,et al.  The distribution of genes in the human genome. , 1991, Gene.

[4]  G Bernardi,et al.  The gene distribution of the human genome. , 1996, Gene.

[5]  Alexander E Vinogradov,et al.  DNA helix: the importance of being AT-rich , 2003, Mammalian Genome.

[6]  E. Lander,et al.  Finishing the euchromatic sequence of the human genome , 2004 .

[7]  R. Myers,et al.  Quality assessment of the human genome sequence , 2004, Nature.

[8]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[9]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[10]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[11]  Martin Vingron,et al.  Mapping translocation breakpoints by next-generation sequencing. , 2008, Genome research.

[12]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[13]  Francisco M. De La Vega,et al.  Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. , 2009, Genome research.

[14]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[15]  Thomas D. Wu,et al.  A highly annotated whole-genome sequence of a Korean individual , 2009, Nature.

[16]  Z. Ning,et al.  Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of GC-biased genomes , 2009, Nature Methods.

[17]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[18]  Dan Watt,et al.  Quality Assessment , 2009, Encyclopedia of Database Systems.

[19]  Richard K. Wilson,et al.  Challenges of sequencing human genomes , 2010, Briefings Bioinform..

[20]  A. Sparks,et al.  The mutation spectrum revealed by paired genome sequences from a lung cancer patient , 2010, Nature.

[21]  Elizabeth T. Cirulli,et al.  The Characterization of Twenty Sequenced Human Genomes , 2010, PLoS genetics.

[22]  Jamie K Teer,et al.  Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing. , 2010, Genome research.

[23]  Masao Nagasaki,et al.  Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing , 2010, Nature Genetics.

[24]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[25]  Elizabeth T. Cirulli,et al.  Whole-Genome Sequencing of a Single Proband Together with Linkage Analysis Identifies a Mendelian Disease Gene , 2010, PLoS genetics.

[26]  Tom Royce,et al.  A comprehensive catalogue of somatic mutations from a human cancer genome , 2010, Nature.

[27]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[28]  Ryan E. Mills,et al.  Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing , 2010, Nature Genetics.

[29]  P. Stankiewicz,et al.  Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. , 2010, The New England journal of medicine.

[30]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[31]  Emmanouil Collab A map of human genome variation from population-scale sequencing , 2011, Nature.

[32]  Nilanjan Chatterjee,et al.  Efficient study design for next generation sequencing , 2011, Genetic epidemiology.