Applying machine learning to predict viral assembly for adeno-associated virus capsid libraries

Abstract Machine learning (ML) can aid in novel discoveries in the field of viral gene therapy. Specifically, big data gathered through next-generation sequencing (NGS) of complex capsid libraries is an especially prominent source of lost potential in data analysis and prediction. Furthermore, adeno-associated virus (AAV) based capsid libraries are becoming increasingly popular as a tool to select candidates for gene therapy vectors. These higher complexity AAV capsid libraries have previously been created and selected in vivo; however, in silico analysis using ML computer algorithms may augment smarter and more robust libraries for selection. In this study, data of AAV capsid libraries gathered before and after viral assembly are used to train ML algorithms. We found that two ML computer algorithms, artificial neural networks (ANNs) and support vector machines (SVMs), can be trained to predict whether unknown capsid variants may assemble into viable virus-like structures. Using the most accurate models constructed, hypothetical mutation patterns in library construction were simulated to suggest the importance of N495, G546, and I554 in AAV2-derived capsids. Finally, two comparative libraries were generated using ML-derived data to biologically validate these findings and demonstrate the predictive power of ML in vector design.eTOC

[1]  Ian Walsh,et al.  NeEMO: a method using residue interaction networks to improve prediction of protein stability upon mutation , 2014, BMC Genomics.

[2]  Robert Sabatier,et al.  IMGT standardized criteria for statistical analysis of immunoglobulin V‐REGION amino acid properties , 2004, Journal of molecular recognition : JMR.

[3]  Kevin K. Yang,et al.  Machine-learning-guided directed evolution for protein engineering , 2018, Nature Methods.

[4]  J. Maizel,et al.  Structural Proteins of Adenovirus-Associated Viruses , 1971, Journal of virology.

[5]  B. Byrne,et al.  Recombinant adeno-associated virus purification using novel methods improves infectious titer and yield , 1999, Gene Therapy.

[6]  J. Heilig,et al.  Large‐Scale Preparation of Plasmid DNA , 1998, Current protocols in molecular biology.

[7]  N. Muzyczka,et al.  Next generation of adeno-associated virus 2 vectors: Point mutations in tyrosines lead to high-efficiency transduction at lower doses , 2008, Proceedings of the National Academy of Sciences.

[8]  Vijay S. Pande,et al.  Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity , 2017, ArXiv.

[9]  Philippe Bogaerts,et al.  Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0 , 2009, Bioinform..

[10]  Frances H. Arnold,et al.  Machine learning-guided channelrhodopsin engineering enables minimally-invasive optogenetics , 2019, Nature Methods.

[11]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[12]  Timo Lassmann,et al.  TagDust2: a generic method to extract reads from sequencing data , 2015, BMC Bioinformatics.

[13]  Eric D. Kelsic,et al.  Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design , 2019, Science.

[14]  Shangzhen Zhou,et al.  Mutations on the External Surfaces of Adeno-AssociatedVirus Type 2 Capsids That Affect Transduction andNeutralization , 2006, Journal of Virology.

[15]  Jae-Hyung Jang,et al.  Directed evolution of adeno-associated virus for enhanced gene delivery and gene targeting in human pluripotent stem cells. , 2012, Molecular therapy : the journal of the American Society of Gene Therapy.

[16]  Ru Xiao,et al.  In Silico Reconstruction of the Viral Evolutionary Lineage Yields a Potent Gene Therapy Vector. , 2015, Cell reports.

[17]  N. Srinivasan,et al.  Bioengineering of AAV2 capsid at specific serine, threonine, or lysine residues improves its transduction efficiency in vitro and in vivo. , 2013, Human gene therapy methods.

[18]  Sarah L. Harris,et al.  Digital Design and Computer Architecture, Second Edition , 2012 .

[19]  D. Schaffer,et al.  Directed evolution of adeno-associated virus yields enhanced gene delivery vectors , 2006, Nature Biotechnology.

[20]  W. Hauswirth,et al.  Novel properties of tyrosine-mutant AAV2 vectors in the mouse retina. , 2011, Molecular therapy : the journal of the American Society of Gene Therapy.

[21]  N. Sharpless,et al.  Engineering and Selection of Shuffled AAV Genomes: A New Strategy for Producing Targeted Biological Nanoparticles. , 2008, Molecular therapy : the journal of the American Society of Gene Therapy.

[22]  Raghvendra Mall,et al.  DeepSol: a deep learning framework for sequence‐based protein solubility prediction , 2018, Bioinform..

[23]  B. Böttcher,et al.  The Assembly-Activating Protein Promotes Capsid Assembly of Different Adeno-Associated Virus Serotypes , 2011, Journal of Virology.

[24]  Carlo Mazzaferro Predicting Protein Binding Affinity With Word Embeddings and Recurrent Neural Networks , 2017, bioRxiv.

[25]  S. Zolotukhin,et al.  High-efficiency transduction and correction of murine hemophilia B using AAV2 vectors devoid of multiple surface-exposed tyrosines. , 2010, Molecular therapy : the journal of the American Society of Gene Therapy.

[26]  R. Samulski,et al.  Adeno-Associated Virus (AAV) Versus Immune Response , 2019, Viruses.

[27]  Sripriya Ravindra Kumar,et al.  Multiplexed Cre-dependent selection yields systemic AAVs for targeting distinct brain cell types , 2020, Nature Methods.

[28]  Ole Winther,et al.  Protein Secondary Structure Prediction with Long Short Term Memory Networks , 2014, ArXiv.

[29]  Steven Salzberg,et al.  BIOINFORMATICS ORIGINAL PAPER , 2004 .

[30]  Jianjun Hu,et al.  DeepMHC: Deep Convolutional Neural Networks for High-performance peptide-MHC Binding Affinity Prediction , 2017, bioRxiv.

[31]  D. Sculley,et al.  Using deep learning to annotate the protein universe , 2019, Nature Biotechnology.

[32]  Gianni De Fabritiis,et al.  DeepSite: protein‐binding site predictor using 3D‐convolutional neural networks , 2017, Bioinform..

[33]  Zhangxin Chen,et al.  ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network , 2017, Molecules.

[34]  Wadih Arap,et al.  Random peptide libraries displayed on adeno-associated virus to select for targeted gene therapy vectors , 2003, Nature Biotechnology.

[35]  Daniela Fischer,et al.  Digital Design And Computer Architecture , 2016 .

[36]  L. Govindasamy,et al.  Vector design Tour de Force: integrating combinatorial and rational approaches to derive novel adeno-associated virus variants. , 2014, Molecular therapy : the journal of the American Society of Gene Therapy.

[37]  G. Wang,et al.  A systematic capsid evolution approach performed in vivo for the design of AAV vectors with tailored properties and tropism , 2019, Proceedings of the National Academy of Sciences.

[38]  D. Grimm,et al.  Pluribus Unum : Fifty years of research , millions of viruses , and one goal-tailored acceleration of AAV evolution , 2015 .

[39]  W Nicholson Price,et al.  Big data and black-box medical algorithms , 2018, Science Translational Medicine.