Optimal trade-off control in machine learning-based library design, with application to adeno-associated virus (AAV) for gene therapy

Adeno-associated viruses (AAVs) hold tremendous promise as delivery vectors for clinical gene therapy, but they need improvement. AAVs with enhanced properties, such as more efficient and/or cell-type specific infection, can be engineered by creating a large, diverse starting library and screening for desired phenotypes, in some cases iteratively. Although this approach has succeeded in numerous specific cases, such as infecting cell types from the brain to the lung, the starting libraries often contain a high proportion of variants unable to assemble or package their genomes, a general prerequisite for engineering any gene delivery goal. Herein, we develop and showcase a machine learning (ML)-based method for systematically designing more effective starting libraries — ones that have broadly good packaging capabilities while being as diverse as possible. Such carefully designed but general libraries stand to significantly increase the chance of success in engineering any property of interest. Furthermore, we use this approach to design a clinically-relevant AAV peptide insertion library that achieves 5-fold higher packaging fitness than the state-of-the-art library, with negligible reduction in diversity. We demonstrate the general utility of this designed library on a downstream task to which our approach was agnostic: infection of primary human brain tissue. The ML-designed library had approximately 10-fold more successful variants than the current state-of-the-art library. Not only should our new library serve useful for any number of other engineering goals, but our library design approach itself can also be applied to other types of libraries for AAV and beyond.

[1]  Yuqiu Wang,et al.  Directed evolution of adeno-associated virus 5 capsid enables specific liver tropism , 2022, Molecular therapy. Nucleic acids.

[2]  Eli N. Weinstein,et al.  Optimal Design of Stochastic DNA Synthesis Protocols based on Generative Sequence Models , 2021, bioRxiv.

[3]  Lucy J. Colwell,et al.  Deep diversification of an AAV capsid protein by machine learning , 2021, Nature Biotechnology.

[4]  Arunava Banerjee,et al.  Applying machine learning to predict viral assembly for adeno-associated virus capsid libraries , 2020, Molecular therapy. Methods & clinical development.

[5]  S. O'Carroll,et al.  AAV Targeting of Glial Cell Types in the Central and Peripheral Nervous System and Relevance to Human Gene Therapy , 2021, Frontiers in Molecular Neuroscience.

[6]  Cécile Fortuny,et al.  In vivo–directed evolution of adeno-associated virus in the primate retina , 2020 .

[7]  Eric D. Kelsic,et al.  Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design , 2019, Science.

[8]  M. Recht,et al.  Etranacogene dezaparvovec (AMT-061 phase 2b): normal/near normal FIX activity and bleed cessation in hemophilia B. , 2019, Blood advances.

[9]  Gevorg Grigoryan,et al.  Pareto Optimization of Combinatorial Mutagenesis Libraries , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  M. Reetz,et al.  Boosting the efficiency of site-saturation mutagenesis for a difficult-to-randomize gene by a two-step PCR strategy , 2018, Applied Microbiology and Biotechnology.

[11]  C. Koch,et al.  A robust ex vivo experimental platform for molecular-genetic dissection of adult human neocortical cell types and circuits , 2018, bioRxiv.

[12]  Mikhail G. Shapiro,et al.  In Vivo Selection of a Computationally Designed SCHEMA AAV Library Yields a Novel Variant for Infection of Adult Neural Stem Cells in the SVZ. , 2018, Molecular therapy : the journal of the American Society of Gene Therapy.

[13]  M. Agbandje-McKenna,et al.  Structure-guided evolution of antigenically distinct adeno-associated virus variants for immune evasion , 2017, Proceedings of the National Academy of Sciences.

[14]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[15]  Claudia Bank,et al.  A Statistical Guide to the Design of Deep Mutational Scanning Experiments , 2016, Genetics.

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Sripriya Ravindra Kumar,et al.  Cre-dependent selection yields AAV variants for widespread gene transfer to the adult brain , 2015, Nature Biotechnology.

[18]  I. Holmes,et al.  AAV ANCESTRAL RECONSTRUCTION LIBRARY ENABLES SELECTION OF BROADLY INFECTIOUS VIRAL VARIANTS , 2015, Gene Therapy.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  H. Nakai,et al.  Drawing a high-resolution functional map of adeno-associated virus capsid by massively parallel sequencing , 2014, Nature Communications.

[21]  Deniz Dalkara,et al.  In Vivo–Directed Evolution of a New Adeno-Associated Virus for Therapeutic Outer Retinal Gene Delivery from the Vitreous , 2013, Science Translational Medicine.

[22]  Manfred T Reetz,et al.  Reducing codon redundancy and screening effort of combinatorial protein libraries created by saturation mutagenesis. , 2013, ACS synthetic biology.

[23]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[24]  D. Schaffer,et al.  Directed evolution of novel adeno-associated viruses for therapeutic gene delivery , 2012, Gene Therapy.

[25]  Albert J. Keung,et al.  An evolved adeno-associated viral variant enhances gene delivery and gene targeting in neural stem cells. , 2011, Molecular therapy : the journal of the American Society of Gene Therapy.

[26]  Chris Bailey-Kellogg,et al.  Optimization of Combinatorial Mutagenesis , 2011, RECOMB.

[27]  R. Keener Theoretical Statistics: Topics for a Core Course , 2010 .

[28]  C. Leborgne,et al.  Prevalence of serum IgG and neutralizing factors against adeno-associated virus (AAV) types 1, 2, 5, 6, 8, and 9 in the healthy population: implications for gene therapy using AAV vectors. , 2010, Human gene therapy.

[29]  H. Tuomisto A diversity of beta diversities: straightening up a concept gone awry. Part 1. Defining beta diversity as a function of alpha and gamma diversity , 2010 .

[30]  J. Flannery,et al.  Molecular evolution of adeno-associated virus for enhanced glial gene delivery. , 2009, Molecular therapy : the journal of the American Society of Gene Therapy.

[31]  Philip A. Romero,et al.  Exploring protein fitness landscapes by directed evolution , 2009, Nature Reviews Molecular Cell Biology.

[32]  James M. Wilson,et al.  Worldwide epidemiology of neutralizing antibodies to adeno-associated viruses. , 2009, The Journal of infectious diseases.

[33]  D. Schaffer,et al.  DNA shuffling of adeno-associated virus yields functionally diverse viral progeny. , 2008, Molecular therapy : the journal of the American Society of Gene Therapy.

[34]  Theresa A. Storm,et al.  In Vitro and In Vivo Gene Therapy Vector Evolution via Multispecies Interbreeding and Retargeting of Adeno-Associated Viruses , 2008, Journal of Virology.

[35]  D. Schaffer,et al.  Directed evolution of adeno-associated virus yields enhanced gene delivery vectors , 2006, Nature Biotechnology.

[36]  U. Baumann,et al.  An efficient one-step site-directed and site-saturation mutagenesis protocol. , 2004, Nucleic acids research.

[37]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[38]  Wadih Arap,et al.  Random peptide libraries displayed on adeno-associated virus to select for targeted gene therapy vectors , 2003, Nature Biotechnology.

[39]  M. Hallek,et al.  In vitro selection of viral vectors with modified tropism: the adeno-associated virus display. , 2003, Molecular therapy : the journal of the American Society of Gene Therapy.

[40]  B. Byrne,et al.  Recombinant adeno-associated virus purification using novel methods improves infectious titer and yield , 1999, Gene Therapy.

[41]  U. Arad Modified Hirt procedure for rapid purification of extrachromosomal DNA from mammalian cells. , 1998, BioTechniques.

[42]  Jack P. C. Kleijnen,et al.  Optimization and Sensitivity Analysis of Computer Simulation Models by the Score Function Method , 1996 .

[43]  S P Azen,et al.  OBTAINING CONFIDENCE INTERVALS FOR THE RISK RATIO IN COHORT STUDIES , 1978 .

[44]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[45]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.