Likelihood-Based Inference of B Cell Clonal Families

The human immune system depends on a highly diverse collection of antibody-making B cells. B cell receptor sequence diversity is generated by a random recombination process called “rearrangement” forming progenitor B cells, then a Darwinian process of lineage diversification and selection called “affinity maturation.” The resulting receptors can be sequenced in high throughput for research and diagnostics. Such a collection of sequences contains a mixture of various lineages, each of which may be quite numerous, or may consist of only a single member. As a step to understanding the process and result of this diversification, one may wish to reconstruct lineage membership, i.e. to cluster sampled sequences according to which came from the same rearrangement events. We call this clustering problem “clonal family inference.” In this paper we describe and validate a likelihood-based framework for clonal family inference based on a multi-hidden Markov Model (multi-HMM) framework for B cell receptor sequences. We describe an agglomerative algorithm to find a maximum likelihood clustering, two approximate algorithms with various trade-offs of speed versus accuracy, and a third, fast algorithm for finding specific lineages. We show that under simulation these algorithms greatly improve upon existing clonal family inference methods, and that they also give significantly different clusters than previous methods when applied to two real data sets.

[1]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[2]  David Haussler,et al.  Phylogenetic Hidden Markov Models , 2005 .

[3]  A. Rodrigo,et al.  Likelihood-based tests of topologies in phylogenetics. , 2000, Systematic biology.

[4]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[5]  S. Frost,et al.  Assigning and visualizing germline genes in antibody repertoires , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[6]  Thomas B. Kepler,et al.  SoDA: implementation of a 3D alignment algorithm for inference of antigen receptor recombinations , 2006, Bioinform..

[7]  Chaim A. Schramm,et al.  Co-evolution of a broadly neutralizing HIV-1 antibody and founder virus , 2013, Nature.

[8]  Feng Gao,et al.  Cooperation of B Cell Lineages in Induction of HIV-1-Broadly Neutralizing Antibodies , 2014, Cell.

[9]  A. Collins,et al.  Identifying highly mutated IGHD genes in the junctions of rearranged human immunoglobulin heavy chain genes. , 2007, Journal of immunological methods.

[10]  Mikhail Shugay,et al.  MiXCR: software for comprehensive adaptive immunity profiling , 2015, Nature Methods.

[11]  Steven H. Kleinstein,et al.  Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data , 2015, Bioinform..

[12]  Tanja Stadler,et al.  Simulating trees with a fixed number of extant species. , 2011, Systematic biology.

[13]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[14]  Joseph M. Volpe,et al.  Large-scale analysis of human heavy chain V(D)J recombination patterns , 2008, Immunome research.

[15]  George Georgiou,et al.  Ultra-high-throughput sequencing of the immune receptor repertoire from millions of lymphocytes , 2016, Nature Protocols.

[16]  T. Mora,et al.  Inferring processes underlying B-cell repertoire diversity , 2015, bioRxiv.

[17]  Carl Boettiger,et al.  An introduction to Docker for reproducible research, with examples from the R environment , 2014, ArXiv.

[18]  J. D. Capra,et al.  Receptor Revision of Immunoglobulin Heavy Chain Variable Region Genes in Normal Human B Lymphocytes , 2000, The Journal of experimental medicine.

[19]  Ramit Mehr,et al.  Models for antigen receptor gene rearrangement: CDR3 length , 2007, Immunology and cell biology.

[20]  Trevor Bedford,et al.  Quantifying evolutionary constraints on B-cell affinity maturation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[21]  Yuval Elhanati,et al.  repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data , 2015, bioRxiv.

[22]  Young Do Kwon,et al.  Maturation and Diversity of the VRC01-Antibody Lineage over 15 Years of Chronic HIV-1 Infection , 2015, Cell.

[23]  William S. DeWitt,et al.  Replicate immunosequencing as a robust probe of B cell repertoire diversity , 2014, 1410.0350.

[24]  Scott D Boyd,et al.  DJ Pairing during VDJ Recombination Shows Positional Biases That Vary among Individuals with Differing IGHD Locus Immunogenotypes , 2016, The Journal of Immunology.

[25]  Thomas B. Kepler,et al.  Interdependence of N Nucleotide Addition and Recombination Site Choice in V(D)J Rearrangement , 1996, The Journal of Immunology.

[26]  Daniel W. Kulp,et al.  Immunization for HIV-1 Broadly Neutralizing Antibodies in Human Ig Knockin Mice , 2015, Cell.

[27]  Steven H. Kleinstein,et al.  B cells populating the multiple sclerosis brain mature in the draining cervical lymph nodes , 2014, Science Translational Medicine.

[28]  IV FrederickA.Matsen,et al.  Consistency of VDJ Rearrangement and Substitution Parameters Enables Accurate B Cell Receptor Sequence Annotation , 2015, PLoS Comput. Biol..

[29]  George Georgiou,et al.  High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire , 2013, Nature Biotechnology.

[30]  H. Eisen,et al.  VARIATIONS IN AFFINITIES OF ANTIBODIES DURING THE IMMUNE RESPONSE. , 1964, Biochemistry.

[31]  Thomas B. Kepler,et al.  SoDA2: a Hidden Markov Model approach for identification of immunoglobulin rearrangements , 2010, Bioinform..

[32]  D. Koller,et al.  High-resolution antibody dynamics of vaccine-induced immune responses , 2014, Proceedings of the National Academy of Sciences.

[33]  Chaim A. Schramm,et al.  Developmental pathway for potent V1V2-directed HIV-neutralizing antibodies , 2014, Nature.

[34]  Thomas B. Kepler,et al.  Reconstructing a B-Cell Clonal Lineage. II. Mutation, Selection, and Affinity Maturation , 2014, Front. Immunol..

[35]  T. Kepler,et al.  Analysis of a Clonal Lineage of HIV-1 Envelope V2/V3 Conformational Epitope-Specific Broadly Neutralizing Antibodies and Their Inferred Unmutated Common Ancestors , 2011, Journal of Virology.

[36]  B. Haynes,et al.  HIV‐1 neutralizing antibodies: understanding nature's pathways , 2013, Immunological reviews.

[37]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[38]  G. Yaari,et al.  Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles , 2015, Proceedings of the National Academy of Sciences.

[39]  F. Melchers Checkpoints that control B cell development. , 2015, The Journal of clinical investigation.

[40]  Steven H. Kleinstein,et al.  The mutation patterns in B-cell immunoglobulin receptors reflect the influence of selection acting at multiple time-scales , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[41]  Peter N. Robinson,et al.  IMSEQ - a fast and error aware approach to immunogenetic sequence analysis , 2015, Bioinform..

[42]  Stephen R. Quake,et al.  Genetic measurement of memory B-cell recall using antibody repertoire sequencing , 2013, Proceedings of the National Academy of Sciences.

[43]  R. Rance,et al.  Network properties derived from deep sequencing of human B-cell receptor repertoires delineate B-cell populations , 2013, Genome research.

[44]  J. Dutheil,et al.  Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs , 2008, BMC Evolutionary Biology.

[45]  L. Penland,et al.  Determinism and stochasticity during maturation of the zebrafish antibody repertoire , 2011, Proceedings of the National Academy of Sciences.

[46]  Patrick Wilson,et al.  iHMMune-align: hidden Markov model-based alignment and identification of germline genes in rearranged immunoglobulin gene sequences , 2007, Bioinform..

[47]  M. Cooper,et al.  The early history of B cells , 2015, Nature Reviews Immunology.

[48]  G. Victora,et al.  Clonal and cellular dynamics in germinal centers. , 2014, Current opinion in immunology.

[49]  Yan Wang,et al.  Clustering-based identification of clonally-related immunoglobulin gene sequence sets , 2010, Immunome research.

[50]  Thomas B Kepler,et al.  Reconstructing a B-cell clonal lineage. I. Statistical inference of unobserved ancestors , 2013, F1000Research.

[51]  Michael W. McCormick,et al.  Shaping of Human Germline IgH Repertoires Revealed by Deep Sequencing , 2012, The Journal of Immunology.