Somatic hypermutation analysis for improved identification of B cell clonal families from next-generation sequencing data

Motivation Adaptive immune receptor repertoire sequencing (AIRR-Seq) offers the possibility of identifying and tracking B cell clonal expansions during adaptive immune responses. Members of a B cell clone are descended from a common ancestor and share the same initial V(D)J rearrangement, but their BCR sequence may differ due to the accumulation of somatic hypermutations (SHMs). Clonal relationships are learned from AIRR-seq data by analyzing the BCR sequence, with the most common methods focused on the highly diverse CDR3 region. However, clonally related cells often share SHMs which have been accumulated during affinity maturation. Here, we investigate whether shared SHMs in the V and J segments of the BCR can be leveraged along with the CDR3 sequence to improve the ability to identify clonally related sequences. We develop independent distance functions that capture shared mutations and CDR3 similarity, and combine these in a spectral clustering framework. Using simulated data, we show that this model improves both the sensitivity and specificity for identifying clonal relationships. Availability Source code for this method is freely available in the SCOPer (Spectral Clustering for clOne Partitioning) R package (version 0.2 or newer) in the Immcantation framework: www.immcantation.org under the CC BY-SA 4.0 license. Contact steven.kleinstein@yale.edu

[1]  Virginia Pascual,et al.  Somatic Hypermutation Introduces Insertions and Deletions into Immunoglobulin V Genes , 1998, The Journal of experimental medicine.

[2]  I. Tomlinson,et al.  Somatic insertions and deletions shape the human antibody repertoire. , 1999, Journal of molecular biology.

[3]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[4]  K. Rajewsky,et al.  Somatic mutation and clonal expansion of B cells in an antigen‐driven immune response. , 1985, The EMBO journal.

[5]  Steven H. Kleinstein,et al.  Models of Somatic Hypermutation Targeting and Substitution Based on Synonymous Mutations from High-Throughput Immunoglobulin Sequencing Data , 2013, Front. Immunol..

[6]  Steven H. Kleinstein,et al.  Optimized Threshold Inference for Partitioning of Clones From High-Throughput B Cell Repertoire Sequencing Data , 2018, Front. Immunol..

[7]  Steven H. Kleinstein,et al.  Tumor-infiltrating immune repertoires captured by single-cell barcoding in emulsion , 2017, bioRxiv.

[8]  S. Boyd,et al.  High-Throughput DNA Sequencing Analysis of Antibody Repertoires , 2014, Microbiology spectrum.

[9]  T. Kepler,et al.  Somatic hypermutation in B cells: an optimal control treatment. , 1993, Journal of theoretical biology.

[10]  D. Burton,et al.  Commonality despite exceptional diversity in the baseline human antibody repertoire , 2018, Nature.

[11]  IV FrederickA.Matsen,et al.  Likelihood-Based Inference of B Cell Clonal Families , 2016, PLoS Comput. Biol..

[12]  Eline T. Luning Prak,et al.  The analysis of clonal expansions in normal and autoimmune B cell repertoires , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[13]  F. Alt,et al.  Joining of immunoglobulin heavy chain gene segments: implications from a chromosome with evidence of three D-JH fusions. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Steven H. Kleinstein,et al.  Identification of Subject-Specific Immunoglobulin Alleles From Expressed Repertoire Sequencing Data , 2018, bioRxiv.

[15]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[16]  M. Egholm,et al.  Measurement and Clinical Monitoring of Human Lymphocyte Clonality by Massively Parallel V-D-J Pyrosequencing , 2009, Science Translational Medicine.

[17]  Susumu Tonegawa,et al.  Junctional sequences of T cell receptor γδ genes: Implications for γδ T cell lineages and for a novel intermediate of V-(D)-J joining , 1989, Cell.

[18]  Francois Vigneault,et al.  A Model of Somatic Hypermutation Targeting in Mice Based on High-Throughput Ig Sequencing Data , 2016, The Journal of Immunology.

[19]  T. Mora,et al.  Inferring processes underlying B-cell repertoire diversity , 2015, bioRxiv.

[20]  Steven H. Kleinstein,et al.  Estimating Hypermutation Rates from Clonal Tree Data 1 , 2003, The Journal of Immunology.

[21]  Keith A. Boroevich,et al.  Whole genome sequencing discriminates hepatocellular carcinoma with intrahepatic metastasis from multi-centric tumors. , 2017, Journal of hepatology.

[22]  Marie-Paule Lefranc,et al.  Immunoglobulin and T Cell Receptor Genes: IMGT® and the Birth and Rise of Immunoinformatics , 2014, Front. Immunol..

[23]  A. Bothwell,et al.  A limited number of B cell lineages generates the heterogeneity of a secondary immune response. , 1987, Journal of immunology.

[24]  Marie-Paule Lefranc,et al.  IMGT/V-QUEST, an integrated software program for immunoglobulin and T cell receptor VJ and VDJrearrangement analysis , 2004, Nucleic Acids Res..

[25]  Julian Q. Zhou,et al.  Immunoglobulin heavy chains are sufficient to determine most B cell clonal relationships1 , 2019, bioRxiv.

[26]  Thomas B. Kepler,et al.  The Nucleotide-Replacement Spectrum Under Somatic Hypermutation Exhibits Microsequence Dependence That Is Strand-Symmetric and Distinct from That Under Germline Mutation1 , 2000, The Journal of Immunology.

[27]  S. Durham,et al.  Local Somatic Hypermutation and Class Switch Recombination in the Nasal Mucosa of Allergic Rhinitis Patients , 2003, The Journal of Immunology.

[28]  T. Kepler,et al.  Sequence intrinsic somatic mutation mechanisms contribute to affinity maturation of VRC01-class HIV-1 broadly neutralizing antibodies , 2017, Proceedings of the National Academy of Sciences.

[29]  Steven H. Kleinstein,et al.  The mutation patterns in B-cell immunoglobulin receptors reflect the influence of selection acting at multiple time-scales , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[30]  A. Bergman,et al.  V-region mutation in vitro, in vivo, and in silico reveal the importance of the enzymatic properties of AID and the sequence environment , 2009, Proceedings of the National Academy of Sciences.

[31]  C. Milstein,et al.  Passenger transgenes reveal intrinsic specificity of the antibody hypermutation mechanism: clustering, polarity, and specific hot spots. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[32]  S. Tonegawa,et al.  Somatic generation of antibody diversity. , 1976, Nature.

[33]  K. P. Murphy,et al.  Janeway's immunobiology , 2007 .

[34]  Thomas B. Kepler,et al.  Sequence-Intrinsic Mechanisms that Target AID Mutational Outcomes on Antibody Genes , 2015, Cell.

[35]  O. Pybus,et al.  Repertoire-wide phylogenetic models of B cell molecular evolution reveal evolutionary signatures of aging and vaccination , 2019, Proceedings of the National Academy of Sciences.

[36]  Nolan G. Ericson,et al.  Digital Genomic Quantification of Tumor-Infiltrating Lymphocytes , 2013, Science Translational Medicine.

[37]  G. Yaari,et al.  Practical guidelines for B-cell receptor repertoire sequencing analysis , 2015, Genome Medicine.

[38]  Steven H. Kleinstein,et al.  Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data , 2015, Bioinform..

[39]  L. Staudt,et al.  Inter- and intraclonal diversity in the antibody response to influenza hemagglutinin , 1985, The Journal of experimental medicine.

[40]  R. White,et al.  High-Throughput Sequencing of the Zebrafish Antibody Repertoire , 2009, Science.

[41]  L. Wysocki,et al.  Sequence-specific targeting of two bases on both DNA strands by the somatic hypermutation mechanism. , 2003, Molecular immunology.

[42]  Stephen L. Hauser,et al.  Naive antibody gene-segment frequencies are heritable and unaltered by chronic lymphocyte ablation , 2011, Proceedings of the National Academy of Sciences.

[43]  Ning Ma,et al.  IgBLAST: an immunoglobulin variable domain sequence analysis tool , 2013, Nucleic Acids Res..

[44]  C. Borrebaeck,et al.  Insertions and deletions in hypervariable loops of antibody heavy chains contribute to molecular diversity. , 1998, Molecular immunology.

[45]  B. Diamond,et al.  The role of somatic mutation in the pathogenic anti-DNA response. , 1992, Annual review of immunology.

[46]  Aaron M. Rosenfeld,et al.  Computational Evaluation of B-Cell Clone Sizes in Bulk Populations , 2018, Front. Immunol..

[47]  Steven H. Kleinstein,et al.  A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data , 2018, Bioinform..

[48]  James E. Crowe,et al.  High frequency of shared clonotypes in human B cell receptor repertoires , 2019, Nature.

[49]  Syed Ahmad Chan Bukhari,et al.  Adaptive Immune Receptor Repertoire Community recommendations for sharing immune-repertoire sequencing data , 2017, Nature Immunology.

[50]  Scott D Boyd,et al.  Convergent antibody signatures in human dengue. , 2013, Cell host & microbe.

[51]  Adrian W. Briggs,et al.  Neutralizing antibodies against West Nile virus identified directly from human B cells by single-cell analysis and next generation sequencing. , 2015, Integrative biology : quantitative biosciences from nano to macro.

[52]  Tanja Stadler,et al.  Comparison of methods for phylogenetic B‐cell lineage inference using time‐resolved antibody repertoire simulations (AbSim) , 2017, Bioinform..

[53]  L. Wysocki,et al.  Di- and trinucleotide target preferences of somatic mutagenesis in normal and autoreactive B cells. , 1996, Journal of immunology.

[54]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[55]  James E. Crowe,et al.  Location and length distribution of somatic hypermutation-associated DNA insertions and deletions reveals regions of antibody structural plasticity , 2012, Genes and Immunity.

[56]  V. Giudicelli,et al.  IMGT(®) tools for the nucleotide analysis of immunoglobulin (IG) and T cell receptor (TR) V-(D)-J repertoires, polymorphisms, and IG mutations: IMGT/V-QUEST and IMGT/HighV-QUEST for NGS. , 2012, Methods in molecular biology.

[57]  Francois Vigneault,et al.  Hierarchical Clustering Can Identify B Cell Clones with High Confidence in Ig Repertoire Sequencing Data , 2017, The Journal of Immunology.

[58]  Uri Hershberg,et al.  Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment. , 2015, Journal of immunological methods.

[59]  L. Staudt,et al.  Generation of antibody diversity in the immune response of BALB/c mice to influenza virus hemagglutinin. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[60]  J. Xu,et al.  Diversity in the CDR3 region of V(H) is sufficient for most antibody specificities. , 2000, Immunity.

[61]  W. Ansorge Next-generation DNA sequencing techniques. , 2009, New biotechnology.

[62]  Syed Ahmad Chan Bukhari,et al.  AIRR Community Standardized Representations for Annotated Immune Repertoires , 2018, Front. Immunol..

[63]  George Georgiou,et al.  In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire , 2014, Nature Medicine.

[64]  Aaron M. Rosenfeld,et al.  An atlas of B-cell clonal distribution in the human body , 2017, Nature Biotechnology.

[65]  Thomas B Kepler,et al.  Reconstructing a B-cell clonal lineage. I. Statistical inference of unobserved ancestors , 2013, F1000Research.

[66]  Thomas B Kepler,et al.  Reconstructing a B-cell clonal lineage , 2016 .