RTCR: a pipeline for complete and accurate recovery of T cell repertoires from high throughput sequencing data

Motivation: High Throughput Sequencing (HTS) has enabled researchers to probe the human T cell receptor (TCR) repertoire, which consists of many rare sequences. Distinguishing between true but rare TCR sequences and variants generated by polymerase chain reaction (PCR) and sequencing errors remains a formidable challenge. The conventional approach to handle errors is to remove low quality reads, and/or rare TCR sequences. Such filtering discards a large number of true and often rare TCR sequences. However, accurate identification and quantification of rare TCR sequences is essential for repertoire diversity estimation. Results: We devised a pipeline, called Recover TCR (RTCR), that accurately recovers TCR sequences, including rare TCR sequences, from HTS data (including barcoded data) even at low coverage. RTCR employs a data-driven statistical model to rectify PCR and sequencing errors in an adaptive manner. Using simulations, we demonstrate that RTCR can easily adapt to the error profiles of different types of sequencers and exhibits consistently high recall and high precision even at low coverages where other pipelines perform poorly. Using published real data, we show that RTCR accurately resolves sequencing errors and outperforms all other pipelines. Availability and Implementation: The RTCR pipeline is implemented in Python (v2.7) and C and is freely available at http://uubram.github.io/RTCR/along with documentation and examples of typical usage. Contact: b.gerritsen@uu.nl

[1]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[2]  Cheng Cheng,et al.  Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire , 2011, BMC Genomics.

[3]  D. Price,et al.  Wrestling with the repertoire: The promise and perils of next generation sequencing for antigen receptors , 2012, European journal of immunology.

[4]  Mikhail Shugay,et al.  Quantitative Profiling of Immune Repertoires for Minor Lymphocyte Counts Using Unique Molecular Identifiers , 2015, The Journal of Immunology.

[5]  Yi Shi,et al.  TCRklass: A New K-String–Based Algorithm for Human and Mouse TCR Repertoire Characterization , 2015, The Journal of Immunology.

[6]  Mikhail Shugay,et al.  Towards error-free profiling of immune repertoires , 2014, Nature Methods.

[7]  J. Braman,et al.  PCR fidelity of pfu DNA polymerase and other thermostable DNA polymerases. , 1996, Nucleic acids research.

[8]  Sergey Lukyanov,et al.  Next generation sequencing for TCR repertoire profiling: Platform‐specific features and correction algorithms , 2012, European journal of immunology.

[9]  Robert A Holt,et al.  The new paradigm of flow cell sequencing. , 2008, Genome research.

[10]  W. Bialek,et al.  Maximum entropy models for antibody diversity , 2009, Proceedings of the National Academy of Sciences.

[11]  Daniel C. Douek,et al.  A Mechanism for TCR Sharing between T Cell Subsets and Individuals Revealed by Pyrosequencing , 2011, The Journal of Immunology.

[12]  Baback Gharizadeh,et al.  High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsets , 2010, Proceedings of the National Academy of Sciences.

[13]  John Shawe-Taylor,et al.  Decombinator: a tool for fast, efficient gene assignment in T-cell receptor sequences using a finite state machine , 2013, Bioinform..

[14]  Abigail Wacher,et al.  Comprehensive assessment of T-cell receptor beta-chain diversity in alphabeta T cells. , 2009, Blood.

[15]  Mikhail Shugay,et al.  MiXCR: software for comprehensive adaptive immunity profiling , 2015, Nature Methods.

[16]  V. Giudicelli,et al.  IMGT(®) tools for the nucleotide analysis of immunoglobulin (IG) and T cell receptor (TR) V-(D)-J repertoires, polymorphisms, and IG mutations: IMGT/V-QUEST and IMGT/HighV-QUEST for NGS. , 2012, Methods in molecular biology.

[17]  Frank Baas,et al.  Human T-cell memory consists mainly of unexpanded clones. , 2010, Immunology letters.

[18]  Richard A. Moore,et al.  Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. , 2011, Genome research.

[19]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[20]  Mark M. Davis,et al.  T-cell antigen receptor genes and T-cell recognition , 1988, Nature.

[21]  Richard A. Olshen,et al.  Diversity and clonal selection in the human T-cell repertoire , 2014, Proceedings of the National Academy of Sciences.

[22]  René L. Warren,et al.  Profiling model T-cell metagenomes with short reads , 2009, Bioinform..

[23]  Robert A Holt,et al.  Sequence analysis of T-cell repertoires in health and disease , 2013, Genome Medicine.

[24]  F. Alt,et al.  The Mechanism and Regulation of Chromosomal V(D)J Recombination , 2002, Cell.

[25]  R. Holt,et al.  Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing. , 2009, Genome research.

[26]  Mikhail Shugay,et al.  MiTCR: software for T-cell receptor sequencing data analysis , 2013, Nature Methods.

[27]  Thierry Mora,et al.  Statistical inference of the generation probability of T-cell receptors from sequence repertoires , 2012, Proceedings of the National Academy of Sciences.

[28]  David A. Hafler,et al.  pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires , 2014, Bioinform..

[29]  Bruce R. Blazar,et al.  CMV reactivation drives posttransplant T-cell reconstitution and results in defects in the underlying TCRβ repertoire. , 2015, Blood.

[30]  S. Linnarsson,et al.  Counting absolute numbers of molecules using unique molecular identifiers , 2011, Nature Methods.

[31]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[32]  Wilfred Ndifon,et al.  Chromatin conformation governs T-cell receptor Jβ gene segment usage , 2012, Proceedings of the National Academy of Sciences.

[33]  J. Calis,et al.  Characterizing immune repertoires by high throughput sequencing: strategies and applications. , 2014, Trends in immunology.

[34]  A. Casrouge,et al.  A Direct Estimate of the Human αβ T Cell Receptor Diversity , 1999 .

[35]  C. Carlson,et al.  Overlap and Effective Size of the Human CD8+ T Cell Receptor Repertoire , 2010, Science Translational Medicine.

[36]  K. Kinzler,et al.  Detection and quantification of rare mutations with massively parallel sequencing , 2011, Proceedings of the National Academy of Sciences.

[37]  Peter N. Robinson,et al.  IMSEQ - a fast and error aware approach to immunogenetic sequence analysis , 2015, Bioinform..

[38]  John Shawe-Taylor,et al.  Computational analysis of stochastic heterogeneity in PCR amplification efficiency revealed by single molecule barcoding , 2015, Scientific Reports.

[39]  Tao Cai,et al.  Screening Mutations of MYBPC3 in 114 Unrelated Patients with Hypertrophic Cardiomyopathy by Targeted Capture and Next-generation Sequencing , 2015, Scientific Reports.

[40]  Olga V. Britanova,et al.  Preparing Unbiased T-Cell Receptor and Antibody cDNA Libraries for the Deep Next Generation Sequencing Profiling , 2013, Front. Immunol..