multiPhATE: bioinformatics pipeline for functional annotation of phage isolates

Summary To address the need for improved phage annotation tools that scale, we created an automated throughput annotation pipeline: multiple-genome Phage Annotation Toolkit and Evaluator (multiPhATE). multiPhATE is a throughput pipeline driver that invokes an annotation pipeline (PhATE) across a user-specified set of phage genomes. This tool incorporates a de novo phage gene-calling algorithm and assigns putative functions to gene calls using protein-, virus-, and phage-centric databases. multiPhATE’s modular construction allows the user to implement all or any portion of the analyses by acquiring local instances of the desired databases and specifying the desired analyses in a configuration file. We demonstrate multiPhATE by annotating two newly sequenced Yersinia pestis phage genomes. Within multiPhATE, the PhATE processing pipeline can be readily implemented across multiple processors, making it adaptable for throughput sequencing projects. Software documentation assists the user in configuring the system. Availability and implementation multiPhATE was implemented in Python 3.7, and runs as a command-line code under Linux or Unix. multiPhATE is freely available under an open-source BSD3 license from https://github.com/carolzhou/multiPhATE. Instructions for acquiring the databases and third-party codes used by multiPhATE are included in the distribution README file. Users may report bugs by submitting to the github issues page associated with the multiPhATE distribution. Contact zhou4@llnl.gov or carol.zhou@comcast.net. Supplementary information Data generated during the current study are included as supplementary files available for download at https://github.com/carolzhou/PhATE_docs.

[1]  P. Daszak,et al.  The Global Virome Project , 2018, Science.

[2]  Carol L. Ecale Zhou,et al.  PHANOTATE: a novel approach to gene identification in phage genomes , 2019, Bioinform..

[3]  Sean R. Eddy,et al.  Hidden Markov model speed heuristic and iterative HMM search procedure , 2010, BMC Bioinformatics.

[4]  Casandra W. Philipson,et al.  Characterizing Phage Genomes for Therapeutic Applications , 2018, Viruses.

[5]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[6]  Carol L. Ecale Zhou,et al.  THEA: A novel approach to gene identification in phage genomes , 2018 .

[7]  Eugene V. Koonin,et al.  Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation , 2016, Nucleic Acids Res..

[8]  S. Abedon,et al.  Re-establishing a place for phage therapy in western medicine. , 2015, Future microbiology.

[9]  M. Borodovsky,et al.  Improved Prokaryotic Gene Prediction Yields Insights into Transcription and Translation Mechanisms on Whole Genome Scale , 2017, bioRxiv.

[10]  Graham F. Hatfull,et al.  PhagesDB: the actinobacteriophage database , 2017, Bioinform..

[11]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[12]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[13]  Matthew B. Sullivan,et al.  VirSorter: mining viral signal from microbial genomic data , 2015, PeerJ.

[14]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[15]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[16]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[17]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[18]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[19]  David S. Wishart,et al.  PHASTER: a better, faster version of the PHAST phage search tool , 2016, Nucleic Acids Res..

[20]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[21]  Barbara A. Bailey,et al.  Prophage genomics reveals patterns in phage genome organization and replication , 2017, bioRxiv.