Assembly-based inference of B-cell receptor repertoires from short read RNA sequencing data with V’DJer

Motivation: B-cell receptor (BCR) repertoire profiling is an important tool for understanding the biology of diverse immunologic processes. Current methods for analyzing adaptive immune receptor repertoires depend upon PCR amplification of VDJ rearrangements followed by long read amplicon sequencing spanning the VDJ junctions. While this approach has proven to be effective, it is frequently not feasible due to cost or limited sample material. Additionally, there are many existing datasets where short-read RNA sequencing data are available but PCR amplified BCR data are not. Results: We present here V’DJer, an assembly-based method that reconstructs adaptive immune receptor repertoires from short-read RNA sequencing data. This method captures expressed BCR loci from a standard RNA-seq assay. We applied this method to 473 Melanoma samples from The Cancer Genome Atlas and demonstrate V’DJer’s ability to accurately reconstruct BCR repertoires from short read mRNA-seq data. Availability and Implementation: V’DJer is implemented in C/C ++, freely available for academic use and can be downloaded from Github: https://github.com/mozack/vdjer Contact: benjamin_vincent@med.unc.edu or parkerjs@email.unc.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  P. Kourilsky,et al.  T-cell repertoire diversity and clonal expansions in normal and clinical samples. , 1995, Immunology today.

[2]  A. Casrouge,et al.  A direct estimate of the human alphabeta T cell receptor diversity. , 1999, Science.

[3]  A. Casrouge,et al.  A Direct Estimate of the Human αβ T Cell Receptor Diversity , 1999 .

[4]  T J Hamblin,et al.  Unmutated Ig V(H) genes are associated with a more aggressive form of chronic lymphocytic leukemia. , 1999, Blood.

[5]  M. Nussenzweig,et al.  Immunoglobulin heavy chain expression shapes the B cell receptor repertoire in human B cell development. , 2001, The Journal of clinical investigation.

[6]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M Hummel,et al.  Design and standardization of PCR primers and protocols for detection of clonal immunoglobulin and T-cell receptor gene recombinations in suspect lymphoproliferations: Report of the BIOMED-2 Concerted Action BMH4-CT98-3936 , 2003, Leukemia.

[8]  Marie-Paule Lefranc,et al.  IMGT/V-QUEST, an integrated software program for immunoglobulin and T cell receptor VJ and VDJrearrangement analysis , 2004, Nucleic Acids Res..

[9]  Marie-Paule Lefranc,et al.  IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis , 2008, Nucleic Acids Res..

[10]  René L. Warren,et al.  Profiling model T-cell metagenomes with short reads , 2009, Bioinform..

[11]  M. Egholm,et al.  Measurement and Clinical Monitoring of Human Lymphocyte Clonality by Massively Parallel V-D-J Pyrosequencing , 2009, Science Translational Medicine.

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  T. Tiller,et al.  Cloning and expression of murine Ig genes from single B cells. , 2009, Journal of immunological methods.

[14]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[15]  Richard A. Moore,et al.  Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. , 2011, Genome research.

[16]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[17]  Michael W. McCormick,et al.  Shaping of Human Germline IgH Repertoires Revealed by Deep Sequencing , 2012, The Journal of Immunology.

[18]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[19]  V. Carlton,et al.  Immunoglobulin and T cell receptor gene high-throughput sequencing quantifies minimal residual disease in acute lymphoblastic leukemia and predicts post-transplantation relapse and survival. , 2014, Biology of blood and marrow transplantation : journal of the American Society for Blood and Marrow Transplantation.

[20]  Matthew D. Wilkerson,et al.  ABRA: improved coding indel detection via assembly-based realignment , 2014, Bioinform..

[21]  Charles M. Perou,et al.  Prognostic B-cell Signatures Using mRNA-Seq in Patients with Subtype-Specific Breast and Ovarian Cancer , 2014, Clinical Cancer Research.

[22]  Derek S. Lundberg,et al.  MT-Toolbox: improved amplicon sequencing using molecule tags , 2014, BMC Bioinformatics.

[23]  Ash A. Alizadeh,et al.  Abstract PR09: The prognostic landscape of genes and infiltrating immune cells across human cancers , 2015 .

[24]  R. Emerson,et al.  High-throughput pairing of T cell receptor α and β sequences , 2015, Science Translational Medicine.

[25]  Benny Chor,et al.  CRISPR Detection from Short Reads Using Partial Overlap Graphs , 2015, RECOMB.

[26]  George Georgiou,et al.  In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire , 2014, Nature Medicine.

[27]  Steven J. M. Jones,et al.  Genomic Classification of Cutaneous Melanoma , 2015, Cell.

[28]  William A. Wood,et al.  Peptide/MHC Tetramer–Based Sorting of CD8+ T Cells to a Leukemia Antigen Yields Clonotypes Drawn Nonspecifically from an Underlying Restricted Repertoire , 2015, Cancer Immunology Research.

[29]  Jeffrey A Jones,et al.  Immunoglobulin transcript sequence and somatic hypermutation computation from unselected RNA-seq reads in chronic lymphocytic leukemia , 2015, Proceedings of the National Academy of Sciences.

[30]  Lisle E. Mose,et al.  Claudin-low bladder tumors are immune infiltrated and actively immune suppressed. , 2016, JCI insight.

[31]  Ali Akoglu,et al.  iWAS--A novel approach to analyzing Next Generation Sequence data for immunology. , 2016, Cellular immunology.