PyIR: a scalable wrapper for processing billions of immunoglobulin and T cell receptor sequences using IgBLAST

Background Recent advances in DNA sequencing technologies have enabled significant leaps in capacity to generate large volumes of DNA sequence data, which has spurred a rapid growth in the use of bioinformatics as a means of interrogating antibody variable gene repertoires. Common tools used for annotation of antibody sequences are often limited in functionality, modularity and usability. Results We have developed PyIR, a Python wrapper and library for IgBLAST, which offers a minimal setup CLI and API, FASTQ support, file chunking for large sequence files, JSON and Python dictionary output, and built-in sequence filtering. Conclusions PyIR offers improved processing speed over multithreaded IgBLAST (version 1.14) when spawning more than 16 processes on a single computer system. Its customizable filtering and data encapsulation allow it to be adapted to a wide range of computing environments. The API allows for IgBLAST to be used in customized bioinformatics workflows.

[1]  Peter D. Kwong,et al.  cAb-Rep: A Database of Curated Antibody Repertoires for Exploring Antibody Diversity and Predicting Antibody Prevalence , 2019, bioRxiv.

[2]  S. Eréndira Avendaño-Vázquez,et al.  The Pipeline Repertoire for Ig-Seq Analysis , 2019, Front. Immunol..

[3]  Baoshan Zhang,et al.  Mining the antibodyome for HIV-1–neutralizing antibodies with next-generation sequencing and phylogenetic pairing of heavy/light chains , 2013, Proceedings of the National Academy of Sciences.

[4]  V. Giudicelli,et al.  IMGT(®) tools for the nucleotide analysis of immunoglobulin (IG) and T cell receptor (TR) V-(D)-J repertoires, polymorphisms, and IG mutations: IMGT/V-QUEST and IMGT/HighV-QUEST for NGS. , 2012, Methods in molecular biology.

[5]  Dennis R. Burton,et al.  Clonify: unseeded antibody lineage assignment from next-generation sequencing data , 2016, Scientific Reports.

[6]  Patrice Duroux,et al.  IMGT®, the international ImMunoGeneTics information system® 25 years on , 2014, Nucleic Acids Res..

[7]  James E. Crowe,et al.  Location and length distribution of somatic hypermutation-associated DNA insertions and deletions reveals regions of antibody structural plasticity , 2012, Genes and Immunity.

[8]  R. White,et al.  High-Throughput Sequencing of the Zebrafish Antibody Repertoire , 2009, Science.

[9]  Mikhail Shugay,et al.  Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences , 2019, Bioinform..

[10]  G. B. Karlsson Hedestam,et al.  Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity , 2016, Nature Communications.

[11]  James E. Crowe,et al.  High frequency of shared clonotypes in human B cell receptor repertoires , 2019, Nature.

[12]  Peter D. Kwong,et al.  Antibodyomics: bioinformatics technologies for understanding B‐cell immunity to HIV‐1 , 2017, Immunological reviews.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  Ning Ma,et al.  IgBLAST: an immunoglobulin variable domain sequence analysis tool , 2013, Nucleic Acids Res..

[15]  David A. Hafler,et al.  pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires , 2014, Bioinform..

[16]  D. Burton,et al.  Commonality despite exceptional diversity in the baseline human antibody repertoire , 2018, Nature.