Performance comparison of whole-genome sequencing platforms

Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. Here we sequenced the genome of an individual with both technologies to a high average coverage of ∼76×, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ∼3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms.

[1]  Francisco M. De La Vega,et al.  Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. , 2009, Genome research.

[2]  Jay Shendure,et al.  Identification by whole-genome resequencing of gene defect responsible for severe hypercholesterolemia , 2010, Human molecular genetics.

[3]  Hugo Y. K. Lam,et al.  Performance comparison of exome DNA sequencing technologies , 2011, Nature Biotechnology.

[4]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[5]  T. Das,et al.  Variants in the 10q26 gene cluster (LOC387715 and HTRA1) exhibit enhanced risk of age-related macular degeneration along with CFH in Indian patients. , 2008, Investigative ophthalmology & visual science.

[6]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[7]  A. Butte,et al.  Non-Synonymous and Synonymous Coding SNPs Show Similar Likelihood and Effect Size of Human Disease Association , 2010, PloS one.

[8]  S. Chanock,et al.  Mutations in TERT, the gene for telomerase reverse transcriptase, in aplastic anemia. , 2005, The New England journal of medicine.

[9]  Stephen C. J. Parker,et al.  Accurate and comprehensive sequencing of personal genomes. , 2011, Genome research.

[10]  A. Sparks,et al.  The mutation spectrum revealed by paired genome sequences from a lung cancer patient , 2010, Nature.

[11]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[12]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[13]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[14]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[15]  Alexander A. Morgan,et al.  Clinical assessment incorporating a personal genome , 2010, The Lancet.

[16]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[17]  P. Shannon,et al.  Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing , 2010, Science.

[18]  R. Durbin,et al.  Dindel: accurate indel calls from short-read data. , 2011, Genome research.

[19]  Mark Gerstein,et al.  Personal genome sequencing: current approaches and challenges. , 2010, Genes & development.

[20]  Dmitry Pushkarev,et al.  Single-molecule sequencing of an individual human genome , 2009, Nature Biotechnology.

[21]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[22]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[23]  J. Hoh,et al.  HTRA1 variants in exudative age-related macular degeneration and interactions with smoking and CFH. , 2008, Investigative ophthalmology & visual science.

[24]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..