Huge Overlap of Individual TCR Beta Repertoires

It has been reported that human TCR repertoires commonly carry so-called public clonotypes – CDR3 variants that are often shared between individuals. Cross-comparison of individual immune repertoires has previously revealed the existence of a population of TCR beta CDR3 variants that are identical at the amino acid level for any two donors (1–3). The lower bound for the total overlap between any two given donors’ TCR beta repertoires within their CD8+ naive T cell subset has been estimated as ~14,000 identical amino acid CDR3 variants based on comparison of 200,000–600,000 individual TCR beta clonotypes (1). Here, we have used deep profiling data consisting of 1–2 × 106 individual TCR beta clonotypes that we obtained from healthy donors (4) to better estimate the total overlap between TCR beta repertoires for any two individuals. The apparent paradox is, that the deeper we sequence, the larger is the percentage of observed overlapping clonotypes between the two repertoires, since the number of possible element pairs between the two sets grows geometrically. To demonstrate this, we analyzed TCR beta repertoires for 12 unrelated pairs assembled from a total of nine human donors [adults and children, see Ref. (4) for details]. We plotted the number of identical variants found in samples of increasing size, with up to 106 unique CDR3 sequences randomly drawn from the repertoires of each individual in a given pair (Figure ​(Figure1).1). For every pair, the number of shared clonotypes grew geometrically with the arithmetic growth of the sample size (Figures ​(Figures1A–C,1A–C, colored lines); at maximum sequencing depth (~1 × 106 unique sequences/donor), we observed an average of ~72,000, 68,000, and 6,000 CDR3 variants that were respectively identical at the amino acid, amino acid only/non-nucleotide and nucleotide level. This exceeds previous estimates (1) by several-fold. The greatest overlap was between two donors from whom we obtained ~1 × 106 and 1.7 × 106 CDR3 variants, where we observed 113,000, 108,000, and 11,000 identical clonotypes at the amino acid, amino acid only/non-nucleotide and nucleotide level, respectively. Figure 1 Overlap of individual TCR beta CDR3 repertoires grows geometrically with the number of sequence pairs sampled. Plots indicate the number of shared sequences for 12 unrelated donor pairs in relation to sample size at the level of (A) all amino acid sequences, ... The lower bound on total individual TCR beta repertoire diversity has previously been estimated to be 5 × 106 unique clonotypes [Ref. (5) and our unpublished data]. With that in mind, we extrapolated our intersection curves by fitting them to a power law model [Y = aXb, as in Ref. (1)], which yielded coefficient “b” close to 2.0 and R2 > 0.999 for all cases (Figures ​(Figures1A–C,1A–C, dashed lines). We estimated that the total overlap of the TCR beta CDR3 repertoires for two individuals constitutes ~2,200,000, 2,060,000, and 180,000 variants, i.e. 44.1, 41.3, and 3.6% of a given individual’s sequence diversity at the amino acid, amino acid only/non-nucleotide, and nucleotide level, respectively. Thus, the real paradox is that nearly half of the TCR beta CDR3 repertoire is functionally identical between any two individuals, in spite of the fact that the theoretical diversity that can be achieved by TCR beta variants has been estimated to be ~5 × 1011 sequences (1, 6). The results from our extrapolation are direct and evident. We took numerous precautions to exclude contamination in our work, including sequencing of pair-analyzed donor repertoires in separate Illumina lanes (4). Even if contaminations were present, these would not affect overlap at the amino acid only/non-nucleotide level (Figure ​(Figure1B).1B). Furthermore, we performed CDR3 extraction and error correction with MiTCR (http://mitcr.milaboratory.com/) using the stringent ETE algorithm, which eliminates 98% of PCR and sequencing errors with minimal loss of natural TCR beta diversity (7). Such large overlap between individuals suggests the existence of a rather limited pool of frequently used functional CDR3 sequences. To further investigate this, we calculated the lower and upper bounds of the Chao richness estimate as described in Ref. (8) based on the numbers of singletons and doubletons (sequences observed in one and two individuals, respectively) in 12 paired donors’ samples. From this model, we obtained a confidence interval of 1.2 × 107 to 5.4 × 107 unique amino acid CDR3 sequences, at a significance level of α = 0.001. These findings represent a shift in our understanding of human adaptive immunity. It now appears likely that recombinatorial biases (3, 9) and thymic selection (4, 10, 11) shape our repertoires so tightly that the majority of TCR beta CDR3 variants expressed by naive T cells leaving the thymus are chosen from a “short-list” of just under 108 amino acid variants – even shorter than the 2 × 109 “effective sequence space” estimated by Robins and colleagues (1). Nevertheless, the repertoire has a complex structure and those clonotypes that are characterized as low-complexity [see figure 7 in Ref. (4)] predominantly form the backbone of the shared clonotype pool. Interestingly, when we examined the intersection of all nine donor samples, we found that the number of donors in which a given clonotype can be detected is distributed according to a power law, with a degree of −2.95 and R2 = 0.99 (Figure ​(Figure1D).1D). These findings confirm the fractal structure of the human TCR beta repertoire that determines the landscape of shared clonotypes (1–3, 12), and may reveal a more complex picture with the deeper profiling experiments.

[1]  Wen-Han Hwang,et al.  Estimating the Richness of a Population When the Maximum Number of Classes Is Fixed: A Nonparametric Solution to an Archaeological Problem , 2012, PloS one.

[2]  J. McCluskey,et al.  T-cell receptor bias and immunity. , 2008, Current opinion in immunology.

[3]  P. Doherty,et al.  Structural determinants of T-cell receptor bias in immunity , 2006, Nature Reviews Immunology.

[4]  D. Price,et al.  The molecular basis for public T-cell responses? , 2008, Nature Reviews Immunology.

[5]  Mark M. Davis,et al.  T-cell antigen receptor genes and T-cell recognition , 1988, Nature.

[6]  Yuanyue Li,et al.  Recombinatorial Biases and Convergent Recombination Determine Interindividual TCRβ Sharing in Murine Thymocytes , 2012, The Journal of Immunology.

[7]  Olga V. Britanova,et al.  Mother and Child T Cell Receptor Repertoires: Deep Profiling Study , 2013, Front. Immunol..

[8]  Daniel C. Douek,et al.  Convergent recombination shapes the clonotypic landscape of the naïve T-cell repertoire , 2010, Proceedings of the National Academy of Sciences.

[9]  D. S. Sivia,et al.  Data Analysis , 1996, Encyclopedia of Evolutionary Psychological Science.

[10]  Daniel C. Douek,et al.  A Mechanism for TCR Sharing between T Cell Subsets and Individuals Revealed by Pyrosequencing , 2011, The Journal of Immunology.

[11]  Mikhail Shugay,et al.  MiTCR: software for T-cell receptor sequencing data analysis , 2013, Nature Methods.

[12]  C. Carlson,et al.  Overlap and Effective Size of the Human CD8+ T Cell Receptor Repertoire , 2010, Science Translational Medicine.

[13]  Abigail Wacher,et al.  Comprehensive assessment of T-cell receptor beta-chain diversity in alphabeta T cells. , 2009, Blood.