Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Emmanuel Dias-Neto,et al.  Next-Generation Phage Display: Integrating and Comparing Available Molecular Tools to Enable Cost-Effective High-Throughput Analysis , 2009, PloS one.

[3]  Sindy K. Y. Tang,et al.  Uniform amplification of phage display libraries in monodisperse emulsions. , 2012, Methods.

[4]  Benjamin Bolduc,et al.  A target-unrelated peptide in an M13 phage display library traced to an advantageous mutation in the gene II ribosome-binding site. , 2008, Analytical biochemistry.

[5]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[6]  Sindy K. Y. Tang,et al.  Uniform amplification of phage with different growth characteristics in individual compartments consisting of monodisperse droplets. , 2010, Angewandte Chemie.

[7]  Sindy K. Y. Tang,et al.  Diversity of Phage-Displayed Libraries of Peptides during Panning and Amplification , 2011, Molecules.

[8]  Jesse J. Salk,et al.  Detection of ultra-rare mutations by next-generation sequencing , 2012, Proceedings of the National Academy of Sciences.

[9]  Mark P. McPike,et al.  Acyclic Identification of Aptamers for Human alpha-Thrombin Using Over-Represented Libraries and Deep Sequencing , 2011, PloS one.

[10]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[11]  W. Marsden I and J , 2012 .

[12]  P. Kellam,et al.  Viral population analysis and minority-variant detection using short read next-generation sequencing , 2013, Philosophical Transactions of the Royal Society B: Biological Sciences.

[13]  Niko Beerenwinkel,et al.  Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies , 2010, Nucleic acids research.

[14]  J. Szostak,et al.  In vitro selection of RNA molecules that bind specific ligands , 1990, Nature.

[15]  Bernard Henrissat,et al.  Organismal, genetic, and transcriptional variation in the deeply sequenced gut microbiomes of identical twins , 2010, Proceedings of the National Academy of Sciences.

[16]  A. Wilm,et al.  LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets , 2012, Nucleic acids research.

[17]  Cristian S. Calude,et al.  Proceedings of the Workshop on Multiset Processing: Multiset Processing, Mathematical, Computer Science, and Molecular Computing Points of View , 2000 .

[18]  Kim-Anh Do,et al.  Steps toward mapping the human vasculature by phage display , 2002, Nature Medicine.

[19]  W. Dower,et al.  Membrane insertion defects caused by positive charges in the early mature region of protein pIII of filamentous phage fd can be corrected by prlA suppressors , 1994, Journal of bacteriology.

[20]  Dylan T Burnette,et al.  Bayesian localisation microscopy reveals nanoscale podosome dynamics , 2011, Nature Methods.

[21]  J. Handelsman,et al.  Metagenomics: genomic analysis of microbial communities. , 2004, Annual review of genetics.

[22]  J. Scott,et al.  Searching for peptide ligands with an epitope library. , 1990, Science.

[23]  E. Rosten,et al.  ImageJ plug-in for Bayesian analysis of blinking and bleaching , 2013, Nature Methods.

[24]  R. White,et al.  High-Throughput Sequencing of the Zebrafish Antibody Repertoire , 2009, Science.

[25]  Susan M. Huse,et al.  Microbial Population Structures in the Deep Marine Biosphere , 2007, Science.

[26]  L. Farinelli,et al.  By-passing in vitro screening—next generation sequencing technologies applied to antibody display and in silico candidate selection , 2010, Nucleic acids research.

[27]  B Levitan,et al.  Stochastic modeling and optimization of phage display. , 1998, Journal of molecular biology.

[28]  Lee Makowski,et al.  Quantitative assessment of peptide sequence diversity in M13 combinatorial peptide phage display libraries. , 2002, Journal of molecular biology.

[29]  F. Bäckhed,et al.  Host-Bacterial Mutualism in the Human Intestine , 2005, Science.

[30]  Gary D. Bader,et al.  MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets , 2011, Nucleic acids research.

[31]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[32]  Gary D. Bader,et al.  Coevolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing. , 2010, Molecular bioSystems.

[33]  Johan T den Dunnen,et al.  Phage display screening without repetitious selection rounds. , 2012, Analytical biochemistry.

[34]  L. Gold,et al.  Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. , 1990, Science.

[35]  Richard Durbin,et al.  A large genome center's improvements to the Illumina sequencing system , 2008, Nature Methods.

[36]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[37]  C. Quince,et al.  Accurate determination of microbial diversity from 454 pyrosequencing data , 2009, Nature Methods.

[38]  L. Makowski Chapter 3:Quantitative Analysis of Peptide Libraries , 2011 .

[39]  B. Finlay,et al.  Phage display: applications, innovations, and issues in phage and host biology. , 1998, Canadian journal of microbiology.

[40]  S. Sprang,et al.  Affinity panning of a library of peptides displayed on bacteriophages reveals the binding specificity of BiP , 1993, Cell.

[41]  V. Petrenko,et al.  Diversity and censoring of landscape phage libraries. , 2009, Protein engineering, design & selection : PEDS.

[42]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[43]  Niko Beerenwinkel,et al.  Ultra-deep sequencing for the analysis of viral populations. , 2011, Current opinion in virology.

[44]  Ali Torkamani,et al.  Phenotype-information-phenotype cycle for deconvolution of combinatorial antibody libraries selected against complex systems , 2011, Proceedings of the National Academy of Sciences.

[45]  Sindy K. Y. Tang,et al.  Prospective identification of parasitic sequences in phage display screens , 2013, Nucleic acids research.

[46]  George Georgiou,et al.  High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire , 2013, Nature Biotechnology.

[47]  Ratmir Derda,et al.  Deep sequencing analysis of phage libraries using Illumina platform. , 2012, Methods.

[48]  Apostolos Syropoulos,et al.  Mathematics of Multisets , 2000, WMP.

[49]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[50]  Margaret C. Linak,et al.  Sequence-specific error profile of Illumina sequencers , 2011, Nucleic acids research.