Quantifying the nativeness of antibody sequences using long short-term memory networks

Abstract Antibodies often undergo substantial engineering en route to the generation of a therapeutic candidate with good developability properties. Characterization of antibody libraries has shown that retaining native-like sequence improves the overall quality of the library. Motivated by recent advances in deep learning, we developed a bi-directional long short-term memory (LSTM) network model to make use of the large amount of available antibody sequence information, and use this model to quantify the nativeness of antibody sequences. The model scores sequences for their similarity to naturally occurring antibodies, which can be used as a consideration during design and engineering of libraries. We demonstrate the performance of this approach by training a model on human antibody sequences and show that our method outperforms other approaches at distinguishing human antibodies from those of other species. We show the applicability of this method for the evaluation of synthesized antibody libraries and humanization of mouse antibodies.

[1]  Yanay Ofran,et al.  Large‐scale analysis of somatic hypermutations in antibodies reveals which structural regions, positions and amino acids are modified to improve affinity , 2014, The FEBS journal.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  S. Farajnia,et al.  Antibody humanization methods – a review and update , 2013, Biotechnology & genetic engineering reviews.

[4]  S. Dübel,et al.  Construction of human naive antibody gene libraries. , 2012, Methods in molecular biology.

[5]  Pierpaolo Bruscolini,et al.  Humanization of Antibodies using a Statistical Inference Approach , 2018, Scientific Reports.

[6]  A. Adler,et al.  Monoclonal antibody humanness score and its applications , 2013, BMC Biotechnology.

[7]  K. J. Jackson,et al.  Next-Generation Sequencing of Antibody Display Repertoires , 2018, Front. Immunol..

[8]  A. Plückthun,et al.  Yet another numbering scheme for immunoglobulin variable domains: an automatic modeling and analysis tool. , 2001, Journal of molecular biology.

[9]  S. Sidhu,et al.  Synthetic antibody technologies. , 2014, Current opinion in structural biology.

[10]  Andreas Plückthun,et al.  The INNs and outs of antibody nonproprietary names , 2015, mAbs.

[11]  G. A. Lazar,et al.  A molecular immunology approach to antibody humanization and functional optimization. , 2007, Molecular immunology.

[12]  Janice M Reichert,et al.  Antibodies to watch in 2019 , 2018, mAbs.

[13]  D. Baker,et al.  Global analysis of protein folding using massively parallel design, synthesis, and testing , 2017, Science.

[14]  D. Cox,et al.  Synthetic antibodies designed on natural sequence landscapes. , 2011, Journal of molecular biology.

[15]  Charlotte M. Deane,et al.  ANARCI: antigen receptor numbering and receptor classification , 2015, Bioinform..

[16]  Lu Zhang,et al.  Massively parallel de novo protein design for targeted therapeutics , 2017, Nature.

[17]  S. Urlinger,et al.  HuCAL PLATINUM, a synthetic Fab library optimized for sequence diversity and superior performance in mammalian expression systems. , 2011, Journal of molecular biology.

[18]  Cédric R. Weber,et al.  Learning the High-Dimensional Immunogenomic Features That Predict Public and Private Antibody Repertoires , 2017, The Journal of Immunology.

[19]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[20]  David A. Hafler,et al.  pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires , 2014, Bioinform..

[21]  Brian D. Weitzner,et al.  RosettaAntibodyDesign (RAbD): A general framework for computational antibody design , 2017, bioRxiv.

[22]  Benny K. C. Lo,et al.  Antibody Engineering: Methods And Protocols , 2008 .

[23]  C. Deane,et al.  Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires , 2018, The Journal of Immunology.

[24]  David Baker,et al.  Removing T-cell epitopes with computational protein design , 2014, Proceedings of the National Academy of Sciences.