OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs

Abstract Motivation High-throughput sequencing of large immune repertoires has enabled the development of methods to predict the probability of generation by V(D)J recombination of T- and B-cell receptors of any specific nucleotide sequence. These generation probabilities are very non-homogeneous, ranging over 20 orders of magnitude in real repertoires. Since the function of a receptor really depends on its protein sequence, it is important to be able to predict this probability of generation at the amino acid level. However, brute-force summation over all the nucleotide sequences with the correct amino acid translation is computationally intractable. The purpose of this paper is to present a solution to this problem. Results We use dynamic programming to construct an efficient and flexible algorithm, called OLGA (Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences), for calculating the probability of generating a given CDR3 amino acid sequence or motif, with or without V/J restriction, as a result of V(D)J recombination in B or T cells. We apply it to databases of epitope-specific T-cell receptors to evaluate the probability that a typical human subject will possess T cells responsive to specific disease-associated epitopes. The model prediction shows an excellent agreement with published data. We suggest that OLGA may be a useful tool to guide vaccine design. Availability and implementation Source code is available at https://github.com/zsethna/OLGA. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Casey S Greene,et al.  Sci-Hub provides access to nearly all scholarly literature , 2018, eLife.

[2]  Andreas Dahl,et al.  CD8+ T cells specific for the islet autoantigen IGRP are restricted in their T cell receptor chain usage , 2017, Scientific Reports.

[3]  William S. DeWitt,et al.  Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity , 2018, bioRxiv.

[4]  P. Mannon,et al.  Expanded TCRβ CDR3 clonotypes distinguish Crohn’s disease and ulcerative colitis patients , 2018, Mucosal Immunology.

[5]  Marco A. Marra,et al.  Massively Parallel Sequencing , 2011, Encyclopedia of Autism Spectrum Disorders.

[6]  Quentin Marcou,et al.  Genesis of the αβ T-cell receptor , 2019, PLoS Comput. Biol..

[7]  Robert A Holt,et al.  Sequence analysis of T-cell repertoires in health and disease , 2013, Genome Medicine.

[8]  T. Mora,et al.  Inferring processes underlying B-cell repertoire diversity , 2015, bioRxiv.

[9]  Dave Ko,et al.  Tissue distribution and clonal diversity of the T and B cell repertoire in type 1 diabetes. , 2016, JCI insight.

[10]  Baback Gharizadeh,et al.  High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsets , 2010, Proceedings of the National Academy of Sciences.

[11]  Balasubramanian Krishnamurthy,et al.  Perinatal tolerance to proinsulin is sufficient to prevent autoimmune diabetes. , 2016, JCI insight.

[12]  T.,et al.  High-Throughput Sequencing of the T-Cell Receptor , 2019 .

[13]  P. Bradley,et al.  Quantifiable predictive features define epitope-specific T cell receptor repertoires , 2017, Nature.

[14]  T. Fuchs,et al.  High binding affinity of repressor IolR avoids costs of untimely induction of myo-inositol utilization by Salmonella Typhimurium , 2017, Scientific Reports.

[15]  R. Holt,et al.  Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing. , 2009, Genome research.

[16]  Abigail Wacher,et al.  Comprehensive assessment of T-cell receptor beta-chain diversity in alphabeta T cells. , 2009, Blood.

[17]  Daniel C. Douek,et al.  The Role of Production Frequency in the Sharing of Simian Immunodeficiency Virus-Specific CD8+ TCRs between Macaques1 , 2008, The Journal of Immunology.

[18]  Mikhail Shugay,et al.  CD8+ T cells with characteristic T cell receptor beta motif are detected in blood and expanded in synovial fluid of ankylosing spondylitis patients , 2018, Rheumatology.

[19]  Yuval Elhanati,et al.  Persisting fetal clonotypes influence the structure and overlap of adult human T cell receptor repertoires , 2016, bioRxiv.

[20]  Benjamin Chain,et al.  High-throughput sequencing of the T-cell receptor repertoire: pitfalls and opportunities , 2017, Briefings Bioinform..

[21]  Yuval Elhanati,et al.  Evidence for Shaping of Light Chain Repertoire by Structural Selection , 2018, Front. Immunol..

[22]  Andrew K. Sewell,et al.  VDJdb: a curated database of T-cell receptor sequences with known antigen specificity , 2017, Nucleic Acids Res..

[23]  Thierry Mora,et al.  Method for identification of condition-associated public antigen receptor sequences , 2017, bioRxiv.

[24]  Yuval Elhanati,et al.  repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data , 2015, bioRxiv.

[25]  Thierry Mora,et al.  The Past, Present, and Future of Immune Repertoire Biology – The Rise of Next-Generation Repertoire Analysis , 2013, Front. Immunol..

[26]  R. Emerson,et al.  High-throughput pairing of T cell receptor α and β sequences , 2015, Science Translational Medicine.

[27]  Mark M. Davis,et al.  Lineage Structure of the Human Antibody Repertoire in Response to Influenza Vaccination , 2013, Science Translational Medicine.

[28]  N. Friedman,et al.  T-cell receptor repertoires share a restricted set of public and abundant CDR3 sequences that are associated with self-related immunity , 2014, Genome research.

[29]  William S. DeWitt,et al.  Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire , 2017, Nature Genetics.

[30]  James Ireland,et al.  Discovery of T Cell Receptor β Motifs Specific to HLA–B27–Positive Ankylosing Spondylitis by Deep Repertoire Sequence Analysis , 2017, Arthritis & rheumatology.

[31]  P. Lindau,et al.  Advances and applications of immune receptor sequencing in systems immunology , 2017 .

[32]  Daniela Latorre,et al.  Functional heterogeneity of human memory CD4+ T cell clones primed by pathogens or vaccines , 2015, Science.

[33]  William S. DeWitt,et al.  Replicate immunosequencing as a robust probe of B cell repertoire diversity , 2014, 1410.0350.

[34]  Yufeng Shen,et al.  Diversity and divergence of the glioma-infiltrating T-cell receptor repertoire , 2016, Proceedings of the National Academy of Sciences.

[35]  Richard A. Olshen,et al.  Diversity and clonal selection in the human T-cell repertoire , 2014, Proceedings of the National Academy of Sciences.

[36]  Yuval Elhanati,et al.  Predicting the spectrum of TCR repertoire sharing with a data‐driven model of recombination , 2018, bioRxiv.

[37]  Thierry Mora,et al.  Quantifying lymphocyte receptor diversity , 2016, bioRxiv.

[38]  Stephen R. Quake,et al.  Signatures of selection in the human antibody repertoire: Selective sweeps, competing subclones, and neutral drift , 2017, Proceedings of the National Academy of Sciences.

[39]  Jing Ma,et al.  Preferential Use of Public TCR during Autoimmune Encephalomyelitis , 2016, The Journal of Immunology.

[40]  M. Davenport,et al.  Specificity, promiscuity, and precursor frequency in immunoreceptors. , 2013, Current opinion in immunology.

[41]  James McCluskey,et al.  Diversity of T Cells Restricted by the MHC Class I-Related Molecule MR1 Facilitates Differential Antigen Recognition. , 2016, Immunity.

[42]  R. White,et al.  High-Throughput Sequencing of the Zebrafish Antibody Repertoire , 2009, Science.

[43]  Grant Lythe,et al.  How many TCR clonotypes does a body maintain? , 2016, Journal of theoretical biology.

[44]  N. Friedman,et al.  T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public CDR3 sequences , 2017, eLife.

[45]  Jayajit Das,et al.  Systems Immunology , 2018 .

[46]  Alessandro Sette,et al.  Identifying specificity groups in the T cell receptor repertoire , 2017, Nature.

[47]  Thierry Mora,et al.  Precise tracking of vaccine-responding T cell clones reveals convergent and personalized response in identical twins , 2018, Proceedings of the National Academy of Sciences.

[48]  Yuval Elhanati,et al.  Insights into immune system development and function from mouse T-cell repertoires , 2017, Proceedings of the National Academy of Sciences.

[49]  George M. Church,et al.  Single-cell sequencing reveals αβ chain pairing shapes the T cell repertoire , 2017, bioRxiv.

[50]  Quentin Marcou,et al.  High-throughput immune repertoire analysis with IGoR , 2017, Nature Communications.

[51]  Stephen R. Quake,et al.  Genetic measurement of memory B-cell recall using antibody repertoire sequencing , 2013, Proceedings of the National Academy of Sciences.

[52]  Thierry Mora,et al.  Statistical inference of the generation probability of T-cell receptors from sequence repertoires , 2012, Proceedings of the National Academy of Sciences.

[53]  C. Carlson,et al.  Overlap and Effective Size of the Human CD8+ T Cell Receptor Repertoire , 2010, Science Translational Medicine.