Convergent selection in antibody repertoires is revealed by deep learning

Adaptive immunity is driven by the ability of lymphocytes to undergo V(D)J recombination and generate a highly diverse set of immune receptors (B cell receptors/secreted antibodies and T cell receptors) and their subsequent clonal selection and expansion upon molecular recognition of foreign antigens. These principles lead to remarkable, unique and dynamic immune receptor repertoires1. Deep sequencing provides increasing evidence for the presence of commonly shared (convergent) receptors across individual organisms within one species2-4. Convergent selection of specific receptors towards various antigens offers one explanation for these findings. For example, single cases of convergence have been reported in antibody repertoires of viral infection or allergy5-8. Recent studies demonstrate that convergent selection of sequence motifs within T cell receptor (TCR) repertoires can be identified on an even wider scale9,10. Here we report that there is extensive convergent selection in antibody repertoires of mice for a range of protein antigens and immunization conditions. We employed a deep learning approach utilizing variational autoencoders (VAEs) to model the underlying process of B cell receptor (BCR) recombination and assume that the data generation follows a Gaussian mixture model (GMM) in latent space. This provides both a latent embedding and cluster labels that group similar sequences, thus enabling the discovery of a multitude of convergent, antigen-associated sequence patterns. Using a linear, one-versus-all support vector machine (SVM), we confirm that the identified sequence patterns are predictive of antigenic exposure and outperform predictions based on the occurrence of public clones. Recombinant expression of both natural and in silico-generated antibodies possessing convergent patterns confirms their binding specificity to target antigens. Our work highlights to which extent convergence in antibody repertoires can occur and shows how deep learning can be applied for immunodiagnostics and antibody discovery and engineering.

[1]  S. Quake,et al.  Memory B Cell Activation, Broad Anti-influenza Antibodies, and Bystander Activation Revealed by Single-Cell Transcriptomics , 2020, Cell reports.

[2]  S. Reddy,et al.  Antibody discovery and engineering by enhanced CRISPR-Cas9 integration of variable gene cassette libraries in mammalian cells , 2019, mAbs.

[3]  Jonathan R. McDaniel,et al.  Functional Interrogation and Mining of Natively-Paired Human VH:VL Antibody Repertoires , 2017, Nature Biotechnology.

[4]  Cédric R. Weber,et al.  High-throughput antibody engineering in mammalian cells by CRISPR/Cas9-mediated homology-directed mutagenesis , 2018, bioRxiv.

[5]  Debora S Marks,et al.  Deep generative models of genetic variation capture the effects of mutations , 2018, Nature Methods.

[6]  S. Reddy,et al.  Boosting subdominant neutralizing antibody responses with a computationally designed epitope-focused immunogen , 2018, bioRxiv.

[7]  Enkelejda Miho,et al.  Bioinformatic and Statistical Analysis of Adaptive Immune Repertoires. , 2015, Trends in immunology.

[8]  Julian Q. Zhou,et al.  Cutting Edge: Ig H Chains Are Sufficient to Determine Most B Cell Clonal Relationships , 2019, The Journal of Immunology.

[9]  William S. DeWitt,et al.  Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire , 2017, Nature Genetics.

[10]  Joseph G. Jardine,et al.  HIV-1 broadly neutralizing antibody precursor B cells revealed by germline-targeting immunogen , 2016, Science.

[11]  Francois Vigneault,et al.  Hierarchical Clustering Can Identify B Cell Clones with High Confidence in Ig Repertoire Sequencing Data , 2017, The Journal of Immunology.

[12]  Marie-Paule Lefranc,et al.  Nomenclature of the Human Immunoglobulin Heavy (IGH) Genes , 2001, Experimental and Clinical Immunogenetics.

[13]  Martin A. Nowak,et al.  Variational auto-encoding of protein sequences , 2017, ArXiv.

[14]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[15]  Alessandro Sette,et al.  Identifying specificity groups in the T cell receptor repertoire , 2017, Nature.

[16]  V. Greiff,et al.  A bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status , 2015, Genome Medicine.

[17]  Lynn Morris,et al.  Multi-Donor Longitudinal Antibody Repertoire Sequencing Reveals the Existence of Public Antibody Clonotypes in HIV-1 Infection , 2018, Cell host & microbe.

[18]  P. Bradley,et al.  Quantifiable predictive features define epitope-specific T cell receptor repertoires , 2017, Nature.

[19]  D. G. Gibson,et al.  Enzymatic assembly of DNA molecules up to several hundred kilobases , 2009, Nature Methods.

[20]  Scott D Boyd,et al.  Convergent antibody signatures in human dengue. , 2013, Cell host & microbe.

[21]  D. Burton,et al.  Commonality despite exceptional diversity in the baseline human antibody repertoire , 2018, Nature.

[22]  Stephen L. Hauser,et al.  Naive antibody gene-segment frequencies are heritable and unaltered by chronic lymphocyte ablation , 2011, Proceedings of the National Academy of Sciences.

[23]  Cédric R. Weber,et al.  Systems Analysis Reveals High Genetic and Antigen-Driven Predetermination of Antibody Repertoires throughout B Cell Development. , 2017, Cell reports.

[24]  Sai T Reddy,et al.  Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting , 2016, Science Advances.

[25]  D. Koller,et al.  High-resolution antibody dynamics of vaccine-induced immune responses , 2014, Proceedings of the National Academy of Sciences.

[26]  Spyros Darmanis,et al.  High-affinity allergen-specific human antibodies cloned from single IgE B cell transcriptomes , 2018, Science.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Sai T. Reddy,et al.  Immunogenomic engineering of a plug-and-(dis)play hybridoma platform , 2016, Nature Communications.

[29]  K Dane Wittrup,et al.  Biophysical properties of the clinical-stage antibody landscape , 2017, Proceedings of the National Academy of Sciences.

[30]  Johannes Trück,et al.  Identification of Antigen-Specific B Cell Receptor Sequences Using Public Repertoire Analysis , 2015, The Journal of Immunology.

[31]  Sai T Reddy,et al.  Advanced Methodologies in High-Throughput Sequencing of Immune Repertoires. , 2017, Trends in biotechnology.

[32]  Seung Hyun Kang,et al.  Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells , 2010, Nature Biotechnology.

[33]  James E. Crowe,et al.  High frequency of shared clonotypes in human B cell receptor repertoires , 2019, Nature.

[34]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[35]  William S. DeWitt,et al.  Deep generative models for T cell receptor protein sequences , 2019, eLife.