Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence

Barcoded vectors are promising tools for investigating clonal diversity and dynamics in hematopoietic gene therapy. Analysis of clones marked with barcoded vectors requires accurate identification of potentially large numbers of individually rare barcodes, when the exact number, sequence identity and abundance are unknown. This is an inherently challenging application, and the feasibility of using contemporary next-generation sequencing technologies is unresolved. To explore this potential application empirically, without prior assumptions, we sequenced barcode libraries of known complexity. Libraries containing 1, 10 and 100 Sanger-sequenced barcodes were sequenced using an Illumina platform, with a 100-barcode library also sequenced using a SOLiD platform. Libraries containing 1 and 10 barcodes were distinguished from false barcodes generated by sequencing error by a several log-fold difference in abundance. In 100-barcode libraries, however, expected and false barcodes overlapped and could not be resolved by bioinformatic filtering and clustering strategies. In independent sequencing runs multiple false-positive barcodes appeared to be represented at higher abundance than known barcodes, despite their confirmed absence from the original library. Such errors, which potentially impact barcoding studies in an application-dependent manner, are consistent with the existence of both stochastic and systematic error, the mechanism of which is yet to be fully resolved.

[1]  M. Hirst,et al.  Clonal analysis via barcoding reveals diverse growth and differentiation of transplanted mouse and human mammary stem cells. , 2014, Cell stem cell.

[2]  T. Thomas,et al.  Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions , 2014, Microbial Informatics and Experimentation.

[3]  Kathryn L. Parsley,et al.  Hematopoietic Stem Cell Gene Therapy for Adenosine Deaminase–Deficient Severe Combined Immunodeficiency Leads to Long-Term Immunological Recovery and Metabolic Correction , 2011, Science Translational Medicine.

[4]  C. Furusawa,et al.  Comparison of Sequence Reads Obtained from Three Next-Generation Sequencing Platforms , 2011, PloS one.

[5]  K. Metzner,et al.  Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data , 2012, Front. Microbio..

[6]  Adrian P Gee,et al.  Inducible apoptosis as a safety switch for adoptive cell therapy. , 2011, The New England journal of medicine.

[7]  S. Morishita,et al.  Efficient frequency-based de novo short-read clustering for error trimming in next-generation sequencing. , 2009, Genome research.

[8]  M. Hirst,et al.  Analysis of the clonal growth and differentiation dynamics of primitive barcoded human cord blood cells in NSG mice. , 2013, Blood.

[9]  Christine Kinnon,et al.  Mutations in TNFRSF13B Encoding TACI Are Associated With Common Variable Immunodeficiency in Humans , 2006, Pediatrics.

[10]  K. Kinzler,et al.  Detection and quantification of rare mutations with massively parallel sequencing , 2011, Proceedings of the National Academy of Sciences.

[11]  Margaret C. Linak,et al.  Sequence-specific error profile of Illumina sequencers , 2011, Nucleic acids research.

[12]  F. Deist,et al.  Gene therapy of human severe combined immunodeficiency (SCID)-X1 disease. , 2000, Science.

[13]  M. Cavazzana‐Calvo,et al.  Lymphomagenesis in SCID-X1 Mice Following Lentivirus-mediated Phenotype Correction Independent of Insertional Mutagenesis and γc Overexpression. , 2010, Molecular therapy : the journal of the American Society of Gene Therapy.

[14]  Olga V. Britanova,et al.  Age-Related Decrease in TCR Repertoire Diversity Measured with Deep and Normalized Sequence Profiling , 2014, The Journal of Immunology.

[15]  Cheng Cheng,et al.  Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire , 2011, BMC Genomics.

[16]  Frederic D. Bushman,et al.  Efficacy of gene therapy for X-linked severe combined immunodeficiency. , 2010, The New England journal of medicine.

[17]  Ingo Roeder,et al.  Multiplexing clonality: combining RGB marking and genetic barcoding , 2014, Nucleic acids research.

[18]  B. Smart Stem-Cell Gene Therapy for the Wiskott-Aldrich Syndrome , 2011, Pediatrics.

[19]  Yang Du,et al.  Correction of X-linked chronic granulomatous disease by gene therapy, augmented by insertional activation of MDS1-EVI1, PRDM16 or SETBP1 , 2006, Nature Medicine.

[20]  Stephan Wolf,et al.  Genome-wide high-throughput integrome analyses by nrLAM-PCR and next-generation sequencing , 2010, Nature Protocols.

[21]  E. Payen,et al.  Arrayed lentiviral barcoding for quantification analysis of hematopoietic dynamics , 2013, Stem cells.

[22]  C. Rivolta,et al.  Ultra High Throughput Sequencing in Human DNA Variation Detection: A Comparative Study on the NDUFA3-PRPF31 Region , 2010, PloS one.

[23]  Shoshannah L. Roth,et al.  A method to sequence and quantify DNA integration for monitoring outcome in gene therapy , 2011, Nucleic acids research.

[24]  H. Heslop,et al.  Flanking-sequence exponential anchored-polymerase chain reaction amplification: a sensitive and highly specific method for detecting retroviral integrant-host-junction sequences. , 2008, Cytotherapy.

[25]  Luca Biasco,et al.  Lentiviral Hematopoietic Stem Cell Gene Therapy in Patients with Wiskott-Aldrich Syndrome , 2013, Science.

[26]  Luca Biasco,et al.  Comprehensive genomic access to vector integration in clinical gene therapy , 2009, Nature Medicine.

[27]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[28]  L. Ailles,et al.  Gene transfer by lentiviral vectors is limited by nuclear translocation and rescued by HIV-1 pol sequences , 2000, Nature Genetics.

[29]  Martin Kircher,et al.  Improved base calling for the Illumina Genome Analyzer using machine learning strategies , 2009, Genome Biology.

[30]  Jeffrey A. Hussmann,et al.  High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing , 2013, Proceedings of the National Academy of Sciences.

[31]  Frederic D. Bushman,et al.  Dynamics of gene-modified progenitor cells analyzed by tracking retroviral integration sites in a human SCID-X1 gene therapy trial. , 2010, Blood.

[32]  A. Gerrits,et al.  Cellular barcoding tool for clonal analysis in the hematopoietic system. , 2010, Blood.

[33]  Juliane C. Dohm,et al.  Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems , 2011, Genome Biology.

[34]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[35]  Alessandro Aiuti,et al.  Gene therapy for immunodeficiency due to adenosine deaminase deficiency. , 2009, The New England journal of medicine.

[36]  Ratmir Derda,et al.  Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing , 2013, Comput. Math. Methods Medicine.

[37]  James L. Zehnder,et al.  High-throughput VDJ sequencing for quantification of minimal residual disease in chronic lymphocytic leukemia and immune reconstitution assessment , 2011, Proceedings of the National Academy of Sciences.

[38]  W. Miller,et al.  Comparison of Sequencing Platforms for Single Nucleotide Variant Calls in a Human Sample , 2013, PloS one.

[39]  A. Mortellaro,et al.  Correction of ADA-SCID by Stem Cell Gene Therapy Combined with Nonmyeloablative Conditioning , 2002, Science.

[40]  Cameron S. Osborne,et al.  LMO2-Associated Clonal T Cell Proliferation in Two Patients after Gene Therapy for SCID-X1 , 2003, Science.

[41]  Niko Beerenwinkel,et al.  Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies , 2010, Nucleic acids research.

[42]  Christine Kinnon,et al.  Insertional mutagenesis combined with acquired somatic mutations causes leukemogenesis following gene therapy of SCID-X1 patients. , 2008, The Journal of clinical investigation.

[43]  T. A. Hall,et al.  BIOEDIT: A USER-FRIENDLY BIOLOGICAL SEQUENCE ALIGNMENT EDITOR AND ANALYSIS PROGRAM FOR WINDOWS 95/98/ NT , 1999 .

[44]  C. von Kalle,et al.  Lentiviral Hematopoietic Stem Cell Gene Therapy Benefits Metachromatic Leukodystrophy , 2013, Science.

[45]  Michael A Quail,et al.  Optimal enzymes for amplifying sequencing libraries , 2011, Nature Methods.

[46]  Manfred Schmidt,et al.  Hematopoietic Stem Cell Gene Therapy with a Lentiviral Vector in X-Linked Adrenoleukodystrophy , 2009, Science.

[47]  T. Schumacher,et al.  Diverse and heritable lineage imprinting of early haematopoietic progenitors , 2013, Nature.

[48]  John C. Wooley,et al.  Ultrafast clustering algorithms for metagenomic sequence analysis , 2012, Briefings Bioinform..

[49]  B. Frey,et al.  Demonstration of the Expand TM PCR System's Greater Fidelity and Higher Yields with a lacI-based PCR Fidelity Assay , 2000 .

[50]  Chuanfeng Wu,et al.  High efficiency restriction enzyme-free linear amplification-mediated polymerase chain reaction approach for tracking lentiviral integration sites does not abrogate retrieval bias. , 2013, Human gene therapy.

[51]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[52]  M. Cavazzana‐Calvo,et al.  Lymphomagenesis in SCID-X1 mice following lentivirus-mediated phenotype correction independent of insertional mutagenesis and gammac overexpression. , 2010, Molecular therapy : the journal of the American Society of Gene Therapy.

[53]  F. Bushman,et al.  Insertional oncogenesis in 4 patients after retrovirus-mediated gene therapy of SCID-X1. , 2008, The Journal of clinical investigation.

[54]  Euan A Ashley,et al.  Performance comparison of whole-genome sequencing platforms , 2011, Nature Biotechnology.

[55]  John C. Marioni,et al.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data , 2009, Bioinform..

[56]  Kathryn L. Parsley,et al.  Long-Term Persistence of a Polyclonal T Cell Repertoire After Gene Therapy for X-Linked Severe Combined Immunodeficiency , 2011, Science Translational Medicine.

[57]  David L. Porter,et al.  T Cells with Chimeric Antigen Receptors Have Potent Antitumor Effects and Can Establish Memory in Patients with Advanced Leukemia , 2011, Science Translational Medicine.

[58]  David Wu,et al.  High-Throughput Sequencing Detects Minimal Residual Disease in Acute T Lymphoblastic Leukemia , 2012, Science Translational Medicine.

[59]  Irving L. Weissman,et al.  Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding , 2011, Nature Biotechnology.

[60]  L. Bystrykh,et al.  Heterogeneity of young and aged murine hematopoietic stem cells revealed by quantitative clonal analysis using cellular barcoding. , 2013, Blood.

[61]  Michael Zuker,et al.  UNAFold: software for nucleic acid folding and hybridization. , 2008, Methods in molecular biology.

[62]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[63]  Hans Martin,et al.  Genomic instability and myelodysplasia with monosomy 7 consequent to EVI1 activation after gene therapy for chronic granulomatous disease , 2010, Nature Medicine.