Deep learning enables therapeutic antibody optimization in mammalian cells by deciphering high-dimensional protein sequence space

Therapeutic antibody optimization is time and resource intensive, largely because it requires low-throughput screening (103 variants) of full-length IgG in mammalian cells, typically resulting in only a few optimized leads. Here, we use deep learning to interrogate and predict antigen-specificity from a massive diversity of antibody sequence space. Using a mammalian display platform and the therapeutic antibody trastuzumab, rationally designed site-directed mutagenesis libraries are introduced by CRISPR/Cas9-mediated homology-directed repair (HDR). Screening and deep sequencing of relatively small libraries (104) produced high quality data capable of training deep neural networks that accurately predict antigen-binding based on antibody sequence (~85% precision). Deep learning is then used to predict millions of antigen binders from an in silico library of ~108 variants. Finally, these variants are subjected to multiple developability filters, resulting in tens of thousands of optimized lead candidates, which when a small subset of 30 are expressed, all 30 are antigen-specific. With its scalability and capacity to interrogate a vast protein sequence space, deep learning offers great potential for antibody engineering and optimization.

[1]  Zachary Wu,et al.  Machine learning-assisted directed protein evolution with combinatorial libraries , 2019, Proceedings of the National Academy of Sciences.

[2]  Sachdev S Sidhu,et al.  Comprehensive functional maps of the antigen-binding site of an anti-ErbB2 antibody obtained with shotgun scanning mutagenesis. , 2002, Journal of molecular biology.

[3]  Sebastian Kelm,et al.  SAbPred: a structure-based antibody prediction server , 2016, Nucleic Acids Res..

[4]  J. Greenbaum,et al.  Improved methods for predicting peptide binding affinity to MHC class II molecules , 2018, Immunology.

[5]  Albert Torri,et al.  Antidrug Antibodies in Patients Treated with Alirocumab. , 2017, The New England journal of medicine.

[6]  Debora S Marks,et al.  Deep generative models of genetic variation capture the effects of mutations , 2018, Nature Methods.

[7]  Michael Wainberg,et al.  Deep learning in biomedicine , 2018, Nature Biotechnology.

[8]  Andreas Krause,et al.  Navigating the protein fitness landscape with Gaussian processes , 2012, Proceedings of the National Academy of Sciences.

[9]  A. M. Stanley,et al.  Structure of the extracellular region of HER 2 alone and in complex with the Herceptin Fab , 2022 .

[10]  Li Li,et al.  Molecular basis of high viscosity in concentrated antibody solutions: Strategies for high concentration drug product development , 2016, mAbs.

[11]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[12]  Claes Gustafsson,et al.  Optimizing the search algorithm for protein engineering by directed evolution. , 2003, Protein engineering.

[13]  Xiaolian Gao,et al.  Effective Optimization of Antibody Affinity by Phage Display Integrated with High-Throughput DNA Synthesis and Sequencing Technologies , 2015, PloS one.

[14]  Markus Heinonen,et al.  Flex ddG: Rosetta ensemble-based estimation of changes in protein-protein binding affinity upon mutation , 2017, bioRxiv.

[15]  Frances H. Arnold,et al.  Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization , 2017, PLoS Comput. Biol..

[16]  David R. Liu,et al.  Methods for the directed evolution of proteins , 2015, Nature Reviews Genetics.

[17]  Mostafa Karimi,et al.  Engineering a HER2-specific antibody-drug conjugate to increase lysosomal delivery and therapeutic efficacy , 2019, Nature Biotechnology.

[18]  Abraham Frandsen,et al.  Machine Learning Methods for Disease Prediction with Claims Data , 2018, 2018 IEEE International Conference on Healthcare Informatics (ICHI).

[19]  Timothy A. Whitehead,et al.  Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing , 2012, Nature Biotechnology.

[20]  Andreas Prlic,et al.  NGL viewer: web‐based molecular graphics for large complexes , 2018, Bioinform..

[21]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[22]  Brian D. Weitzner,et al.  Modeling and docking of antibody structures with Rosetta , 2017, Nature Protocols.

[23]  Cédric R. Weber,et al.  Learning the High-Dimensional Immunogenomic Features That Predict Public and Private Antibody Repertoires , 2017, The Journal of Immunology.

[24]  Cynthia A. Brewer,et al.  ColorBrewer in Print: A Catalog of Color Schemes for Maps , 2003 .

[25]  Cédric R. Weber,et al.  High-throughput antibody engineering in mammalian cells by CRISPR/Cas9-mediated homology-directed mutagenesis , 2018, bioRxiv.

[26]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[27]  Jiye Shi,et al.  Five computational developability guidelines for therapeutic antibody profiling , 2019, Proceedings of the National Academy of Sciences.

[28]  Michele A. Busby,et al.  Supplementary Materials for Deep learning using tumor HLA peptide mass spectrometry datasets improves neoantigen identification , 2018 .

[29]  Richard Fox,et al.  Directed molecular evolution by machine learning and the influence of nonlinear interactions. , 2005, Journal of theoretical biology.

[30]  D. Baker,et al.  High Resolution Mapping of Protein Sequence–Function Relationships , 2010, Nature Methods.

[31]  Michele Vendruscolo,et al.  Third generation antibody discovery methods: in silico rational design. , 2018, Chemical Society reviews.

[32]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[33]  N. Jojic,et al.  Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences , 2017, bioRxiv.

[34]  Athena W Wong,et al.  Development of a semi-automated high throughput transient transfection system. , 2014, Journal of biotechnology.

[35]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[36]  Sai T. Reddy,et al.  Comprehensive Evaluation and Optimization of Amplicon Library Preparation Methods for High-Throughput Antibody Sequencing , 2014, PloS one.

[37]  Kunihiro Hattori,et al.  Antibody recycling by engineered pH-dependent antigen binding improves the duration of antigen neutralization , 2010, Nature Biotechnology.

[38]  Andrew C. R. Martin,et al.  Analyzing the "degree of humanness" of antibody sequences. , 2007, Journal of molecular biology.

[39]  Jens Meiler,et al.  RosettaScripts: A Scripting Language Interface to the Rosetta Macromolecular Modeling Suite , 2011, PloS one.

[40]  Andreas Prlic,et al.  Web-based molecular graphics for large complexes , 2016, Web3D.

[41]  M. Bronstein,et al.  Deciphering interaction fingerprints from protein molecular surfaces , 2019, bioRxiv.

[42]  Jennifer Johnston,et al.  Avidity-based binding to HER2 results in selective killing of HER2-overexpressing cells by anti-HER2/CD3 , 2018, Science Translational Medicine.

[43]  Bjoern Peters,et al.  Functional classification of class II human leukocyte antigen (HLA) molecules reveals seven different supertypes and a surprising degree of repertoire sharing across supertypes , 2011, Immunogenetics.

[44]  Mikhail Shugay,et al.  MiXCR: software for comprehensive adaptive immunity profiling , 2015, Nature Methods.

[45]  Victor Greiff,et al.  Large-scale network analysis reveals the sequence space architecture of antibody repertoires , 2019, Nature Communications.

[46]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[47]  James R. Apgar,et al.  Augmented Binary Substitution: Single-pass CDR germ-lining and stabilization of therapeutic antibodies , 2015, Proceedings of the National Academy of Sciences.

[48]  Sai T. Reddy,et al.  Immunogenomic engineering of a plug-and-(dis)play hybridoma platform , 2016, Nature Communications.

[49]  Michele Vendruscolo,et al.  The CamSol method of rational design of protein mutants with enhanced solubility. , 2015, Journal of molecular biology.

[50]  K Dane Wittrup,et al.  Biophysical properties of the clinical-stage antibody landscape , 2017, Proceedings of the National Academy of Sciences.

[51]  Ken A. Dill,et al.  In silico selection of therapeutic antibodies for development: Viscosity, clearance, and chemical stability , 2014, Proceedings of the National Academy of Sciences.

[52]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[53]  Frances H Arnold,et al.  Innovation by homologous recombination. , 2013, Current opinion in chemical biology.

[54]  Justin B. Kinney,et al.  Logomaker: beautiful sequence logos in Python , 2019, bioRxiv.

[55]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[56]  Bruno E. Correia,et al.  rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics , 2019, BMC Bioinformatics.

[57]  Fred L. Drake,et al.  The Python Language Reference Manual , 1999 .

[58]  Lilia A. Rabia,et al.  Net charge of antibody complementarity-determining regions is a key predictor of specificity. , 2018, Protein engineering, design & selection : PEDS.

[59]  Omar Wagih,et al.  ggseqlogo: a versatile R package for drawing sequence logos , 2017, Bioinform..

[60]  Charles C. Persinger,et al.  How to improve R&D productivity: the pharmaceutical industry's grand challenge , 2010, Nature Reviews Drug Discovery.

[61]  Leland Wilkinson,et al.  ggplot2: Elegant Graphics for Data Analysis by WICKHAM, H. , 2011 .