Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles

Significance High-throughput sequencing of B-cell immunoglobulin receptors is providing unprecedented insight into adaptive immunity. A key step in analyzing these data involves assignment of the germline variable (V), diversity (D), and joining (J) gene-segment alleles that comprise each immunoglobulin sequence by matching them against a database of known V(D)J alleles. However, this process will fail for sequences that use previously undetected alleles, whose frequency in the population is unclear. Here we describe TIgGER, a computational method that significantly improves V(D)J allele assignments by first determining the complete set of gene segments carried by a subject, including novel alleles. The application of TIgGER identifies a surprisingly high frequency of novel alleles, highlighting the critical need for this approach. Individual variation in germline and expressed B-cell immunoglobulin (Ig) repertoires has been associated with aging, disease susceptibility, and differential response to infection and vaccination. Repertoire properties can now be studied at large-scale through next-generation sequencing of rearranged Ig genes. Accurate analysis of these repertoire-sequencing (Rep-Seq) data requires identifying the germline variable (V), diversity (D), and joining (J) gene segments used by each Ig sequence. Current V(D)J assignment methods work by aligning sequences to a database of known germline V(D)J segment alleles. However, existing databases are likely to be incomplete and novel polymorphisms are hard to differentiate from the frequent occurrence of somatic hypermutations in Ig sequences. Here we develop a Tool for Ig Genotype Elucidation via Rep-Seq (TIgGER). TIgGER analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.

[1]  T. Honjo,et al.  Class Switch Recombination and Hypermutation Require Activation-Induced Cytidine Deaminase (AID), a Potential RNA Editing Enzyme , 2000, Cell.

[2]  Marie-Paule Lefranc,et al.  Nomenclature of the Human Immunoglobulin Heavy (IGH) Genes , 2001, Experimental and Clinical Immunogenetics.

[3]  D. Schatz,et al.  Somatic Hypermutation of Immunoglobulin Genes Merging Mechanisms for Genetic Diversity , 2002, Cell.

[4]  V. Giudicelli,et al.  IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains. , 2003, Developmental and comparative immunology.

[5]  Ida Retter,et al.  VBASE2, an integrative V gene database , 2004, Nucleic Acids Res..

[6]  Mathieu Rouard,et al.  IMGT unique numbering for immunoglobulin and T cell receptor constant domains and Ig superfamily C-like domains. , 2005, Developmental and comparative immunology.

[7]  L. Ohm-Laursen,et al.  Analysis of 6912 Unselected Somatic Hypermutations in Human VDJ Rearrangements Reveals Lack of Strand Specificity and Correlation between Phase II Substitution Rates and Distance to the Nearest 3′ Activation-Induced Cytidine Deaminase Target1 , 2007, The Journal of Immunology.

[8]  Patrick Wilson,et al.  iHMMune-align: hidden Markov model-based alignment and identification of germline genes in rearranged immunoglobulin gene sequences , 2007, Bioinform..

[9]  Yan Wang,et al.  Many human immunoglobulin heavy‐chain IGHV gene polymorphisms have been reported in error , 2008, Immunology and cell biology.

[10]  Marie-Paule Lefranc,et al.  IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis , 2008, Nucleic Acids Res..

[11]  R. Vaughan,et al.  B-cell diversity decreases in old age and is correlated with poor health status , 2009, Aging cell.

[12]  M. Egholm,et al.  Individual Variation in the Germline Ig Gene Repertoire Inferred from Variable Region Gene Rearrangements , 2010, The Journal of Immunology.

[13]  Thomas B. Kepler,et al.  SoDA2: a Hidden Markov Model approach for identification of immunoglobulin rearrangements , 2010, Bioinform..

[14]  Michal Barak,et al.  Ig gene diversification and selection in follicular lymphoma, diffuse large B cell lymphoma and primary central nervous system lymphoma revealed by lineage tree and mutation analyses. , 2010, International immunology.

[15]  W. Pomat,et al.  Genomic screening by 454 pyrosequencing identifies a new human IGHV gene and sixteen other new IGHV allelic variants , 2011, Immunogenetics.

[16]  Steven H. Kleinstein,et al.  Detecting selection in immunoglobulin sequences , 2011, Nucleic Acids Res..

[17]  F. Breden,et al.  The immunoglobulin heavy chain locus: genetic variation, missing data, and implications for human disease , 2012, Genes and Immunity.

[18]  Mark M. Tanaka,et al.  The Inference of Phased Haplotypes for the Immunoglobulin H Chain V Region Gene Loci by Analysis of VDJ Gene Rearrangements , 2012, The Journal of Immunology.

[19]  Soumya Raychaudhuri,et al.  Interrogating the major histocompatibility complex with high-throughput genomics. , 2012, Human molecular genetics.

[20]  Y. Louzoun,et al.  Rep‐Seq: uncovering the immunological repertoire through next‐generation sequencing , 2012, Immunology.

[21]  Scott D Boyd,et al.  Convergent antibody signatures in human dengue. , 2013, Cell host & microbe.

[22]  Yong Cui,et al.  Genetic susceptibility to SLE: recent progress from GWAS. , 2013, Journal of autoimmunity.

[23]  Jamie K. Scott,et al.  Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes and characterization of allelic and copy-number variation. , 2013, American journal of human genetics.

[24]  Scott D Boyd,et al.  Human lymphocyte repertoires in ageing. , 2013, Current opinion in immunology.

[25]  Steven H. Kleinstein,et al.  Models of Somatic Hypermutation Targeting and Substitution Based on Synonymous Mutations from High-Throughput Immunoglobulin Sequencing Data , 2013, Front. Immunol..

[26]  M. Pirinen,et al.  Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis , 2013, Nature Genetics.

[27]  M. Eisenstein Personalized, sequencing-based immune profiling spurs startups , 2013, Nature Biotechnology.

[28]  Ning Ma,et al.  IgBLAST: an immunoglobulin variable domain sequence analysis tool , 2013, Nucleic Acids Res..

[29]  I. Vlahavas,et al.  Immunoglobulin heavy variable (IGHV) genes and alleles: new entities, new names and implications for research and prognostication in chronic lymphocytic leukaemia , 2014, Immunogenetics.

[30]  Steven H. Kleinstein,et al.  B cells populating the multiple sclerosis brain mature in the draining cervical lymph nodes , 2014, Science Translational Medicine.

[31]  Mark M. Davis,et al.  Human responses to influenza vaccination show seroconversion signatures and convergent antibody rearrangements. , 2014, Cell host & microbe.

[32]  D. Koller,et al.  High-resolution antibody dynamics of vaccine-induced immune responses , 2014, Proceedings of the National Academy of Sciences.

[33]  David A. Hafler,et al.  pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires , 2014, Bioinform..

[34]  Y. Kochi,et al.  Genetic basis of rheumatoid arthritis: a current review. , 2014, Biochemical and biophysical research communications.