IgRepertoireConstructor: a novel algorithm for antibody repertoire construction and immunoproteogenomics analysis

The analysis of concentrations of circulating antibodies in serum (antibody repertoire) is a fundamental, yet poorly studied, problem in immunoinformatics. The two current approaches to the analysis of antibody repertoires [next generation sequencing (NGS) and mass spectrometry (MS)] present difficult computational challenges since antibodies are not directly encoded in the germline but are extensively diversified by somatic recombination and hypermutations. Therefore, the protein database required for the interpretation of spectra from circulating antibodies is custom for each individual. Although such a database can be constructed via NGS, the reads generated by NGS are error-prone and even a single nucleotide error precludes identification of a peptide by the standard proteomics tools. Here, we present the IgRepertoireConstructor algorithm that performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires. Availability and implementation: IgRepertoireConstructor is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from http://bioinf.spbau.ru/igtools. Contact: ppevzner@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  P. Pevzner,et al.  Target-Decoy Approach and False Discovery Rate: When Things May Go Wrong , 2011, Journal of the American Society for Mass Spectrometry.

[3]  P. Pevzner,et al.  Automated de novo protein sequencing of monoclonal antibodies , 2008, Nature Biotechnology.

[4]  R. Brezinschek,et al.  Analysis of the frequency and pattern of somatic mutations within nonproductively rearranged human variable heavy chain genes. , 1997, Journal of immunology.

[5]  Thomas B. Kepler,et al.  SoDA: implementation of a 3D alignment algorithm for inference of antigen receptor recombinations , 2006, Bioinform..

[6]  Mikhail Shugay,et al.  Towards error-free profiling of immune repertoires , 2014, Nature Methods.

[7]  L. Penland,et al.  Determinism and stochasticity during maturation of the zebrafish antibody repertoire , 2011, Proceedings of the National Academy of Sciences.

[8]  Patrick Wilson,et al.  iHMMune-align: hidden Markov model-based alignment and identification of germline genes in rearranged immunoglobulin gene sequences , 2007, Bioinform..

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  A. Nesvizhskii Proteogenomics: concepts, applications and computational strategies , 2014, Nature Methods.

[11]  Robert E. Tarjan,et al.  Algorithmic Aspects of Vertex Elimination on Graphs , 1976, SIAM J. Comput..

[12]  Pinar Heggernes,et al.  Graph-Theoretic Concepts in Computer Science , 2016, Lecture Notes in Computer Science.

[13]  Marie-Paule Lefranc,et al.  IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis , 2008, Nucleic Acids Res..

[14]  D. Koller,et al.  High-resolution antibody dynamics of vaccine-induced immune responses , 2014, Proceedings of the National Academy of Sciences.

[15]  Edward M. Marcotte,et al.  Proteomic Identification of Monoclonal Antibodies from Serum , 2014, Analytical chemistry.

[16]  Z. Modrušan,et al.  Predicting immunogenic tumour mutations by combining mass spectrometry and exome sequencing , 2014, Nature.

[17]  Richard A. Moore,et al.  Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. , 2011, Genome research.

[18]  S. Quake,et al.  The promise and challenge of high-throughput sequencing of the antibody repertoire , 2014, Nature Biotechnology.

[19]  P. Pevzner,et al.  Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. , 2008, Journal of proteome research.

[20]  C. DeLisi,et al.  Phenotypic connections in surprising places , 2010, Genome Biology.

[21]  Jérôme Lane,et al.  IMGT®, the international ImMunoGeneTics information system® , 2004, Nucleic Acids Res..

[22]  Abigail Wacher,et al.  Comprehensive assessment of T-cell receptor beta-chain diversity in alphabeta T cells. , 2009, Blood.

[23]  P Kalaivani,et al.  Chordal Graphs and Their Clique Graphs , 2014 .

[24]  Sean A Beausoleil,et al.  Proteomics-directed cloning of circulating antiviral human monoclonal antibodies , 2012, Nature Biotechnology.

[25]  M. Batzer,et al.  The impact of retrotransposons on human genome evolution , 2009, Nature Reviews Genetics.

[26]  Andrew D. Ellington,et al.  Identification and characterization of the constituent human serum antibodies elicited by vaccination , 2014, Proceedings of the National Academy of Sciences.

[27]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[28]  Stephen R. Quake,et al.  Genetic measurement of memory B-cell recall using antibody repertoire sequencing , 2013, Proceedings of the National Academy of Sciences.

[29]  N A Kolchanov,et al.  Somatic hypermutagenesis in immunoglobulin genes. II. Influence of neighbouring base sequences on mutagenesis. , 1992, Biochimica et biophysica acta.

[30]  P. Pevzner,et al.  Whole-genome analysis of Alu repeat elements reveals complex evolutionary history. , 2004, Genome research.

[31]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[32]  F. Burnet A modification of jerne's theory of antibody production using the concept of clonal selection , 1976, CA: a cancer journal for clinicians.

[33]  R. Holt,et al.  Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing. , 2009, Genome research.

[34]  Mark M. Davis,et al.  Lineage Structure of the Human Antibody Repertoire in Response to Influenza Vaccination , 2013, Science Translational Medicine.

[35]  Victor Greiff,et al.  Quantitative assessment of the robustness of next-generation sequencing of antibody variable gene repertoires from immunized mice , 2014, BMC Immunology.

[36]  Alla Lapidus,et al.  IgSimulator: a versatile immunosequencing simulator , 2015, Bioinform..

[37]  Ning Ma,et al.  IgBLAST: an immunoglobulin variable domain sequence analysis tool , 2013, Nucleic Acids Res..

[38]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[39]  Sergey I. Nikolenko,et al.  BayesHammer: Bayesian clustering for error correction in single-cell sequencing , 2012, BMC Genomics.

[40]  Eunok Paek,et al.  Fast Multi-blind Modification Search through Tandem Mass Spectrometry* , 2011, Molecular & Cellular Proteomics.

[41]  Bin Ma,et al.  Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy , 2009, Bioinform..

[42]  Michel Habib,et al.  Chordal Graphs and Their Clique Graphs , 1995, WG.

[43]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[44]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[45]  Vineet Bafna,et al.  Resurrection of a clinical antibody: Template proteogenomic de novo proteomic sequencing and reverse engineering of an anti‐lymphotoxin‐α antibody , 2011, Proteomics.

[46]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[47]  Pavel A. Pevzner,et al.  Immunoglobulin Classification Using the Colored Antibody Graph , 2015, RECOMB.

[48]  Sean A Beausoleil,et al.  A proteomics approach for the identification and cloning of monoclonal antibodies from serum , 2012, Nature Biotechnology.

[49]  C. Nusbaum,et al.  High-Resolution Description of Antibody Heavy-Chain Repertoires in Humans , 2011, PloS one.

[50]  Seung Hyun Kang,et al.  Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells , 2010, Nature Biotechnology.

[51]  Andrew D. Ellington,et al.  Molecular deconvolution of the monoclonal antibodies that comprise the polyclonal serum response , 2013, Proceedings of the National Academy of Sciences.

[52]  C. Carlson,et al.  Overlap and Effective Size of the Human CD8+ T Cell Receptor Repertoire , 2010, Science Translational Medicine.

[53]  R. White,et al.  High-Throughput Sequencing of the Zebrafish Antibody Repertoire , 2009, Science.

[54]  Paul Medvedev,et al.  Error correction of high-throughput sequencing datasets with non-uniform coverage , 2011, Bioinform..

[55]  M. Yannakakis Computing the Minimum Fill-in is NP^Complete , 1981 .