De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries

Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.

[1]  Vivek K. Mutalik,et al.  Composability of regulatory sequences controlling transcription and translation in Escherichia coli , 2013, Proceedings of the National Academy of Sciences.

[2]  Timothy A. Whitehead,et al.  Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing , 2012, Nature Biotechnology.

[3]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[4]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[5]  David W. Colby,et al.  Conformation-dependent epitopes recognized by prion protein antibodies probed using mutational scanning and deep sequencing. , 2015, Journal of molecular biology.

[6]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[7]  Joseph B Hiatt,et al.  Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis , 2013, Proceedings of the National Academy of Sciences.

[8]  G. Stephanopoulos,et al.  Feedback Inhibition of Chorismate Mutase/Prephenate Dehydrogenase (TyrA) of Escherichia coli: Generation and Characterization of Tyrosine-Insensitive Mutants , 2005, Applied and Environmental Microbiology.

[9]  D. Baker,et al.  High Resolution Mapping of Protein Sequence–Function Relationships , 2010, Nature Methods.

[10]  Manfred T Reetz,et al.  Assembly of Designed Oligonucleotides as an Efficient Method for Gene Recombination: A New Tool in Directed Evolution , 2003, Chembiochem : a European journal of chemical biology.

[11]  Michael Z. Lin,et al.  Non-invasive intravital imaging of cellular differentiation with a bright red-excitable fluorescent protein , 2014, Nature Methods.

[12]  Pavel A Pevzner,et al.  How to apply de Bruijn graphs to genome assembly. , 2011, Nature biotechnology.

[13]  S. Fields,et al.  Engineering A-kinase Anchoring Protein (AKAP)-selective Regulatory Subunits of Protein Kinase A (PKA) through Structure-based Phage Selection* , 2013, The Journal of Biological Chemistry.

[14]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[15]  W. Stemmer,et al.  Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. , 1995, Gene.

[16]  J. Devito Recombineering with tolC as a Selectable/Counter-selectable Marker: remodeling the rRNA Operons of Escherichia coli , 2007, Nucleic acids research.

[17]  W. Stemmer Rapid evolution of a protein in vitro by DNA shuffling , 1994, Nature.

[18]  S. Fields,et al.  Deep mutational scanning: a new style of protein science , 2014, Nature Methods.

[19]  Jay Shendure,et al.  Parallel, tag-directed assembly of locally derived short sequence reads , 2010, Nature Methods.

[20]  I. Tomlinson,et al.  The repertoire of human germline VH sequences reveals about fifty groups of VH segments with different hypervariable loops. , 1992, Journal of molecular biology.

[21]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[22]  I. Tomlinson,et al.  A directory of human germ‐line Vχ segments reveals a strong bias in their usage , 1994 .

[23]  G. Church,et al.  Large-scale de novo DNA synthesis: technologies and applications , 2014, Nature Methods.

[24]  Stanley Lu,et al.  DNA polymerase preference determines PCR priming efficiency , 2014, BMC Biotechnology.

[25]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[26]  E. Kabat,et al.  Sequences of proteins of immunological interest , 1991 .

[27]  S. Turner,et al.  Real-time DNA sequencing from single polymerase molecules. , 2010, Methods in enzymology.

[28]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[29]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[30]  D. G. Gibson,et al.  Enzymatic assembly of DNA molecules up to several hundred kilobases , 2009, Nature Methods.

[31]  J. Kitzman,et al.  Massively Parallel Single Amino Acid Mutagenesis , 2014, Nature Methods.

[32]  X P Zhang,et al.  Random mutagenesis of gene-sized DNA molecules by use of PCR with Taq DNA polymerase. , 1991, Nucleic acids research.

[33]  I. Tomlinson,et al.  A directory of human germ-line V kappa segments reveals a strong bias in their usage. , 1994, European journal of immunology.

[34]  M. Ostermeier,et al.  PFunkel: Efficient, Expansive, User-Defined Mutagenesis , 2012, PloS one.

[35]  Susan Lindquist,et al.  Hsp90 and Environmental Stress Transform the Adaptive Value of Natural Genetic Variation , 2010, Science.

[36]  Raghavan Varadarajan,et al.  A rapid, efficient, and economical inverse polymerase chain reaction-based method for generating a site saturation mutant library. , 2014, Analytical biochemistry.

[37]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[38]  Wendy I. Wilson,et al.  Ligase chain reaction (LCR)--overview and applications. , 1994, PCR methods and applications.

[39]  Tom W Muir,et al.  Traceless protein splicing utilizing evolved split inteins , 2009, Proceedings of the National Academy of Sciences.