Profiling model T-cell metagenomes with short reads

MOTIVATION T-cell receptor (TCR) diversity in peripheral blood has not yet been fully profiled with sequence level resolution. Each T-cell clonotype expresses a unique receptor, generated by somatic recombination of TCR genes and the enormous potential for T-cell diversity makes repertoire analysis challenging. We developed a sequencing approach and assembly software (immuno-SSAKE or iSSAKE) for profiling T-cell metagenomes using short reads from the massively parallel sequencing platforms. RESULTS Models of sequence diversity for the TCR beta-chain CDR3 region were built using empirical data and used to simulate, at random, distinct TCR clonotypes at 1-20 p.p.m. Using simulated TCRbeta (sTCRbeta) sequences, we randomly created 20 million 36 nt reads having 1-2% random error, 20 million 42 or 50 nt reads having 1% random error and 20 million 36 nt reads with 1% error modeled on real short read data. Reads aligning to the end of known TCR variable (V) genes and having consecutive unmatched bases in the adjacent CDR3 were used to seed iSSAKE de novo assemblies of CDR3. With assembled 36 nt reads, we detect over 51% and 63% of rare (1 p.p.m.) clonotypes using a random or modeled error distribution, respectively. We detect over 99% of more abundant clonotypes (6 p.p.m. or higher) using either error distribution. Longer reads improve sensitivity, with assembled 42 and 50 nt reads identifying 82.0% and 94.7% of rare 1 p.p.m. clonotypes, respectively. Our approach illustrates the feasibility of complete profiling of the TCR repertoire using new massively parallel short read sequencing technology. AVAILABILITY ftp://ftp.bcgsc.ca/supplementary/iSSAKE.

[1]  A. Casrouge,et al.  A Direct Estimate of the Human αβ T Cell Receptor Diversity , 1999 .

[2]  Hiroyuki Kishi,et al.  Comprehensive analysis of the functional TCR repertoire at the single-cell level. , 2008, Biochemical and biophysical research communications.

[3]  Hitoshi Sakano,et al.  Sequences at the somatic recombination sites of immunoglobulin light-chain genes , 1979, Nature.

[4]  Z Reich,et al.  Ligand recognition by alpha beta T cell receptors. , 1998, Annual review of immunology.

[5]  Gabriele Di Sante,et al.  Administration of PLP139–151 Primes T Cells Distinct from Those Spontaneously Responsive In Vitro to This Antigen1 , 2008, The Journal of Immunology.

[6]  Juliane C. Dohm,et al.  SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. , 2007, Genome research.

[7]  Mark M. Davis,et al.  Genomic organization and sequence of T-cell receptor β-chain constant- and joining-region genes , 1984, Nature.

[8]  Mark M. Davis,et al.  Localization of a T-cell receptor diversity-region element , 1984, Nature.

[9]  M. Zöller,et al.  The sizes of the CDR3 hypervariable regions of the murine T-cell receptor beta chains vary as a function of the recombined germ-line segments. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[10]  M. Lefranc,et al.  Variable region genes in the human T‐cell rearranging gamma (TRG) locus: V‐J junction and homology with the mouse genes. , 1988, The EMBO journal.

[11]  Bernhard Hemmer,et al.  High throughput analysis of TCR-β rearrangement and gene expression in single T cells , 2006, Laboratory Investigation.

[12]  Mark M. Davis,et al.  LIGAND RECOGNITION BY T CELL RECEPTORS , 1998 .

[13]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[14]  I. Weissman,et al.  Cloning of terminal transferase cDNA by antibody screening. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Robert A Holt,et al.  The new paradigm of flow cell sequencing. , 2008, Genome research.

[16]  S. Bennett Solexa Ltd. , 2004, Pharmacogenomics.

[17]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[18]  Andreas Prlic,et al.  Ensembl 2008 , 2007, Nucleic Acids Res..

[19]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[20]  Vincent J. Magrini,et al.  Extending assembly of short DNA sequences to handle error , 2007, Bioinform..

[21]  Mark M. Davis,et al.  Isolation of cDNA clones encoding T cell-specific membrane-associated proteins , 1984, Nature.

[22]  B. Kissela,et al.  Circulating T cell repertoire complexity in normal individuals and bone marrow recipients analyzed by CDR3 size spectratyping. Correlation with immune status. , 1994, Journal of immunology.

[23]  René L. Warren,et al.  Assembling millions of short DNA sequences using SSAKE , 2006, Bioinform..