Routine performance and errors of 454 HLA exon sequencing in diagnostics

BackgroundNext-generation sequencing (NGS) has changed genomics significantly. More and more applications strive for sequencing with different platforms. Now, in 2012, after a decade of development and evolution, NGS has been accepted for a variety of research fields. Determination of sequencing errors is essential in order to follow next-generation sequencing beyond research use only. This study describes the overall 454 system performance of using multiple GS Junior runs with an in-house established and validated diagnostic assay for human leukocyte antigen (HLA) exon sequencing. Based on this data, we extracted, evaluated and characterized errors and variants of 60 HLA loci per run with respect to their adjacencies.ResultsWe determined an overall error rate of 0.18% in a total of 118,484,408 bases. 31.3% of all reads analyzed (n=349,503) contain one or more errors. The largest group are deletions that account for 50% of the errors. Incorrect bases are not distributed equally along sequences and tend to be more frequent at sequence ends. Certain sequence positions in the middle or at the beginning of the read accumulate errors. Typically, the corresponding quality score at the actual error position is lower than the adjacent scores.ConclusionsHere we present the first error assessment in a human next-generation sequencing diagnostics assay in an amplicon sequencing approach. Improvements of sequence quality and error rate that have been made over the years are evident and it is shown that both have now reached a level where diagnostic applications become feasible. Our presented data are better than previously published error rates and we can confirm and quantify the often described relation of homopolymers and errors. Nevertheless, a certain depth of coverage is needed, in particular with challenging areas of the sequencing target. Furthermore, the usage of error correcting tools is not essential but might contribute towards the capacity and efficiency of a sequencing run.

[1]  Inge Jonassen,et al.  Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim , 2010, Bioinform..

[2]  Helene Polin,et al.  Rapid high-throughput human leukocyte antigen typing by massively parallel pyrosequencing for high-resolution allele identification. , 2009, Human immunology.

[3]  K. Voelkerding,et al.  Next-generation sequencing: from basic research to diagnostics. , 2009, Clinical chemistry.

[4]  M. Waterman,et al.  The accuracy of DNA sequences: estimating sequence quality. , 1992, Genomics.

[5]  Jo Vandesompele,et al.  Analysing 454 amplicon resequencing experiments using the modular and database oriented Variant Identification Pipeline , 2010, BMC Bioinformatics.

[6]  Helene Polin,et al.  Rapid, scalable and highly automated HLA genotyping using next-generation sequencing: a transition from research to diagnostics , 2013, BMC Genomics.

[7]  Lauren M. Bragg,et al.  Fast, accurate error-correction of amplicon pyrosequences using Acacia , 2012, Nature Methods.

[8]  C. Nusbaum,et al.  Quality scores and SNP detection in sequencing-by-synthesis systems. , 2008, Genome research.

[9]  Fredrik Lysholm,et al.  An efficient simulator of 454 data using configurable statistical models , 2011, BMC Research Notes.

[10]  D. Monos,et al.  Next-generation sequencing: the solution for high-resolution, unambiguous human leukocyte antigen typing. , 2010, Human immunology.

[11]  B. Frey,et al.  Demonstration of the Expand TM PCR System's Greater Fidelity and Higher Yields with a lacI-based PCR Fidelity Assay , 2000 .

[12]  Vladimir Brusic,et al.  Ultra-high resolution HLA genotyping and allele discovery by highly multiplexed cDNA amplicon pyrosequencing , 2012, BMC Genomics.

[13]  T. Thomas,et al.  GemSIM: general, error-model based simulator of next-generation sequencing data , 2012, BMC Genomics.

[14]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[15]  Jerzy K. Kulski,et al.  Super high resolution for single molecule-sequence-based typing of classical HLA loci at the 8-digit level using next generation sequencers. , 2012, Tissue antigens.

[16]  Aleksandar Milosavljevic,et al.  An integrative variant analysis suite for whole exome next-generation sequencing data , 2012, BMC Bioinformatics.

[17]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[18]  S. Krishnakumar,et al.  High-throughput, high-fidelity HLA genotyping with deep sequencing , 2012, Proceedings of the National Academy of Sciences.

[19]  Shale Dames,et al.  Next generation sequencing for clinical diagnostics-principles and application to targeted resequencing for hypertrophic cardiomyopathy: a paper from the 2009 William Beaumont Hospital Symposium on Molecular Pathology. , 2010, The Journal of molecular diagnostics : JMD.

[20]  James Robinson,et al.  The IMGT/HLA database , 2008, Nucleic Acids Res..

[21]  R Higuchi,et al.  High-resolution, high-throughput HLA genotyping by next-generation sequencing. , 2009, Tissue antigens.

[22]  Pavel Skums,et al.  Efficient error correction for next-generation sequencing of viral amplicons , 2012, BMC Bioinformatics.

[23]  J. Aerssens,et al.  Minor variant detection in amplicons using 454 massive parallel pyrosequencing: experiences and considerations for successful applications. , 2011, BioTechniques.

[24]  Susan M. Huse,et al.  Ironing out the wrinkles in the rare biosphere through improved OTU clustering , 2010, Environmental microbiology.

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  Susan M. Huse,et al.  Accuracy and quality of massively parallel DNA pyrosequencing , 2007, Genome Biology.

[27]  T. Dallman,et al.  Performance comparison of benchtop high-throughput sequencing platforms , 2012, Nature Biotechnology.

[28]  D. Dimitrov,et al.  454 antibody sequencing - error characterization and correction , 2011, BMC Research Notes.

[29]  Brent R Logan,et al.  A perspective on the selection of unrelated donors and cord blood units for transplantation. , 2012, Blood.

[30]  P. Hufnagl,et al.  Sequence Capture and Next Generation Resequencing of the MHC Region Highlights Potential Transplantation Determinants in HLA Identical Haematopoietic Stem Cell Transplantation , 2011, DNA research : an international journal for rapid publication of reports on genes and genomes.

[31]  Emese Meglécz,et al.  Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing , 2011, BMC Genomics.

[32]  Matthew W. Anderson,et al.  A multi-site study using high-resolution HLA genotyping by next generation sequencing. , 2011, Tissue antigens.

[33]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[34]  M. Danzer,et al.  What Next? The Next Transit from Biology to Diagnostics: Next Generation Sequencing for Immunogenetics , 2011, Transfusion Medicine and Hemotherapy.