IMPre: An Accurate and Efficient Software for Prediction of T- and B-Cell Receptor Germline Genes and Alleles from Rearranged Repertoire Data

Large-scale study of the properties of T-cell receptor (TCR) and B-cell receptor (BCR) repertoires through next-generation sequencing is providing excellent insights into the understanding of adaptive immune responses. Variable(Diversity)Joining [V(D)J] germline genes and alleles must be characterized in detail to facilitate repertoire analyses. However, most species do not have well-characterized TCR/BCR germline genes because of their high homology. Also, more germline alleles are required for humans and other species, which limits the capacity for studying immune repertoires. Herein, we developed “Immune Germline Prediction” (IMPre), a tool for predicting germline V/J genes and alleles using deep-sequencing data derived from TCR/BCR repertoires. We developed a new algorithm, “Seed_Clust,” for clustering, produced a multiway tree for assembly and optimized the sequence according to the characteristics of rearrangement. We trained IMPre on human samples of T-cell receptor beta (TRB) and immunoglobulin heavy chain and then tested it on additional human samples. Accuracy of 97.7, 100, 92.9, and 100% was obtained for TRBV, TRBJ, IGHV, and IGHJ, respectively. Analyses of subsampling performance for these samples showed IMPre to be robust using different data quantities. Subsequently, IMPre was tested on samples from rhesus monkeys and human long sequences: the highly accurate results demonstrated IMPre to be stable with animal and multiple data types. With rapid accumulation of high-throughput sequence data for TCR and BCR repertoires, IMPre can be applied broadly for obtaining novel genes and a large number of novel alleles. IMPre is available at https://github.com/zhangwei2015/IMPre.

[1]  Yufeng Shen,et al.  Tracking donor-reactive T cells: Evidence for clonal deletion in tolerant kidney transplant patients , 2015, Science Translational Medicine.

[2]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[3]  G. Yaari,et al.  Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles , 2015, Proceedings of the National Academy of Sciences.

[4]  M. Egholm,et al.  Individual Variation in the Germline Ig Gene Repertoire Inferred from Variable Region Gene Rearrangements , 2010, The Journal of Immunology.

[5]  Scott D Boyd,et al.  Convergent antibody signatures in human dengue. , 2013, Cell host & microbe.

[6]  I. Vlahavas,et al.  Immunoglobulin heavy variable (IGHV) genes and alleles: new entities, new names and implications for research and prognostication in chronic lymphocytic leukaemia , 2014, Immunogenetics.

[7]  T. Honjo,et al.  Class Switch Recombination and Hypermutation Require Activation-Induced Cytidine Deaminase (AID), a Potential RNA Editing Enzyme , 2000, Cell.

[8]  W. Howard,et al.  Immunoglobulin light-chain genes in the rhesus macaque I: kappa light-chain germline sequences for subgroups IGKV1, IGKV and IGKV3 , 2005, Immunogenetics.

[9]  H. Koprowski,et al.  Chromosomal location of the genes for human immunoglobulin heavy chains. , 1979, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Alastair D G Lawson,et al.  Analysis of heavy and light chain sequences of conventional camelid antibodies from Camelus dromedarius and Camelus bactrianus species. , 2014, Journal of immunological methods.

[11]  R. Holt,et al.  Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing. , 2009, Genome research.

[12]  Sean A Beausoleil,et al.  A proteomics approach for the identification and cloning of monoclonal antibodies from serum , 2012, Nature Biotechnology.

[13]  D. Schatz,et al.  Somatic Hypermutation of Immunoglobulin Genes Merging Mechanisms for Genetic Diversity , 2002, Cell.

[14]  W. Howard,et al.  IGHV1, IGHV5 and IGHV7 subgroup genes in the Rhesus macaque , 2003, Immunogenetics.

[15]  D. Douek,et al.  Extraction and characterization of the rhesus macaque T cell receptor β-chain genes , 2009, Immunology and cell biology.

[16]  Richard A. Moore,et al.  Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. , 2011, Genome research.

[17]  Xun Xu,et al.  IMonitor: A Robust Pipeline for TCR and BCR Repertoire Analysis , 2015, Genetics.

[18]  Mark M. Davis,et al.  Lineage Structure of the Human Antibody Repertoire in Response to Influenza Vaccination , 2013, Science Translational Medicine.

[19]  Mark M. Davis,et al.  Human responses to influenza vaccination show seroconversion signatures and convergent antibody rearrangements. , 2014, Cell host & microbe.

[20]  M Hummel,et al.  Design and standardization of PCR primers and protocols for detection of clonal immunoglobulin and T-cell receptor gene recombinations in suspect lymphoproliferations: Report of the BIOMED-2 Concerted Action BMH4-CT98-3936 , 2003, Leukemia.