TCRklass: A New K-String–Based Algorithm for Human and Mouse TCR Repertoire Characterization

The next-generation sequencing technology has promoted the study on human TCR repertoire, which is essential for the adaptive immunity. To decipher the complexity of TCR repertoire, we developed an integrated pipeline, TCRklass, using K-string–based algorithm that has significantly improved the accuracy and performance over existing tools. We tested TCRklass using manually curated short read datasets in comparison with in silico datasets; it showed higher precision and recall rates on CDR3 identification. We applied TCRklass on large datasets of two human and three mouse TCR repertoires; it demonstrated higher reliability on CDR3 identification and much less biased V/J profiling, which are the two components contributing the diversity of the repertoire. Because of the sequencing cost, short paired-end reads generated by next-generation sequencing technology are and will remain the main source of data, and we believe that the TCRklass is a useful and reliable toolkit for TCR repertoire analysis.

[1]  R. Holt,et al.  Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing. , 2009, Genome research.

[2]  V. Giudicelli,et al.  IMGT(®) tools for the nucleotide analysis of immunoglobulin (IG) and T cell receptor (TR) V-(D)-J repertoires, polymorphisms, and IG mutations: IMGT/V-QUEST and IMGT/HighV-QUEST for NGS. , 2012, Methods in molecular biology.

[3]  Abigail Wacher,et al.  Comprehensive assessment of T-cell receptor beta-chain diversity in alphabeta T cells. , 2009, Blood.

[4]  R. Germain,et al.  The Generation and Selection of the T Cell Repertoire: Insights from Studies of the Molecular Basis of T Cell Recognition , 1988, Immunological reviews.

[5]  René L. Warren,et al.  Profiling model T-cell metagenomes with short reads , 2009, Bioinform..

[6]  Patrice Duroux,et al.  IMGT/HIGHV-QUEST: THE IMGT® WEB PORTAL FOR IMMUNOGLOBULIN (IG) OR ANTIBODY AND T CELL RECEPTOR (TR) ANALYSIS FROM NGS HIGH THROUGHPUT AND DEEP SEQUENCING , 2012 .

[7]  T. Mak,et al.  Mouse T-cell receptor variable gene segment families , 2004, Immunogenetics.

[8]  Tak W. Mak,et al.  Human T-cell receptor variable gene segment families , 1995, Immunogenetics.

[9]  A. Casrouge,et al.  A direct estimate of the human alphabeta T cell receptor diversity. , 1999, Science.

[10]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[11]  Jian Ye,et al.  BLAST: improvements for better sequence analysis , 2006, Nucleic Acids Res..

[12]  P. Robinson,et al.  TCR Repertoire Analysis by Next Generation Sequencing Allows Complex Differential Diagnosis of T Cell–Related Pathology , 2013, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.

[13]  John Shawe-Taylor,et al.  Decombinator: a tool for fast, efficient gene assignment in T-cell receptor sequences using a finite state machine , 2013, Bioinform..

[14]  A. Casrouge,et al.  A Direct Estimate of the Human αβ T Cell Receptor Diversity , 1999 .

[15]  Marie-Paule Lefranc,et al.  IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis , 2008, Nucleic Acids Res..

[16]  Fangqing Zhao,et al.  Composition-based classification of short metagenomic sequences elucidates the landscapes of taxonomic and functional enrichment of microorganisms , 2012, Nucleic acids research.

[17]  Mikhail Shugay,et al.  MiTCR: software for T-cell receptor sequencing data analysis , 2013, Nature Methods.

[18]  Richard A. Moore,et al.  Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. , 2011, Genome research.

[19]  Baback Gharizadeh,et al.  High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsets , 2010, Proceedings of the National Academy of Sciences.

[20]  P. Doherty,et al.  Structural determinants of T-cell receptor bias in immunity , 2006, Nature Reviews Immunology.

[21]  Jun Wu,et al.  HTQC: a fast quality control toolkit for Illumina sequencing data , 2013, BMC Bioinformatics.

[22]  J. Danska,et al.  The presumptive CDR3 regions of both T cell receptor alpha and beta chains determine T cell specificity for myoglobin peptides , 1990, The Journal of experimental medicine.

[23]  Patrice Duroux,et al.  IMGT/HighV QUEST paradigm for T cell receptor IMGT clonotype diversity and next generation repertoire immunoprofiling , 2013, Nature Communications.

[24]  Sergey Lukyanov,et al.  Next generation sequencing for TCR repertoire profiling: Platform‐specific features and correction algorithms , 2012, European journal of immunology.

[25]  M. Ronaghi,et al.  Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. , 2007, Genome research.

[26]  T. Flotte,et al.  Human Treg responses allow sustained recombinant adeno-associated virus-mediated transgene expression. , 2013, The Journal of clinical investigation.

[27]  Jérôme Lane,et al.  IMGT®, the international ImMunoGeneTics information system® , 2004, Nucleic Acids Res..

[28]  Brian J. Stevenson,et al.  Highly diverse TCRα chain repertoire of pre‐immune CD8+ T cells reveals new insights in gene recombination , 2012, The EMBO journal.

[29]  Brian J. Stevenson,et al.  Highly diverse TCRα chain repertoire of pre‐immune CD8+ T cells reveals new insights in gene recombination , 2012, The EMBO journal.