RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

[1]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[2]  Mihai Pop,et al.  ARDB—Antibiotic Resistance Genes Database , 2008, Nucleic Acids Res..

[3]  Mark Borodovsky,et al.  Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite. , 2011, Current protocols in bioinformatics.

[4]  R de Groot,et al.  Novel BOX repeat PCR assay for high-resolution typing of Streptococcus pneumoniae strains , 1996, Journal of clinical microbiology.

[5]  Jordan M. Utley,et al.  R-FAP: Rapid Functional Annotation of Prokaryotes Using Taxon-specific Pan-genomes and 10-mer Peptides , 2014 .

[6]  Robert A. Edwards,et al.  PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies , 2012, Nucleic acids research.

[7]  Patricia Siguier,et al.  ISfinder: the reference centre for bacterial insertion sequences , 2005, Nucleic Acids Res..

[8]  Fangfang Xia,et al.  The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST) , 2013, Nucleic Acids Res..

[9]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[10]  Y. He,et al.  PHIDIAS: a pathogen-host interaction data integration and analysis system , 2007, Genome Biology.

[11]  Natalia N. Ivanova,et al.  The DOE-JGI Standard Operating Procedure for the Annotations of Microbial Genomes , 2009, Standards in genomic sciences.

[12]  Peter F. Hallin,et al.  RNAmmer: consistent and rapid annotation of ribosomal RNA genes , 2007, Nucleic acids research.

[13]  V. Kunin,et al.  CRISPR — a widespread system that provides acquired resistance against phages in bacteria and archaea , 2008, Nature Reviews Microbiology.

[14]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[15]  R. Wilson,et al.  Modernizing Reference Genome Assemblies , 2011, PLoS biology.

[16]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[17]  Ana Tereza Ribeiro de Vasconcelos,et al.  A System for Automated Bacterial (genome) Integrated Annotation - SABIA , 2004, Bioinform..

[18]  Andrew C. Pawlowski,et al.  The Comprehensive Antibiotic Resistance Database , 2013, Antimicrobial Agents and Chemotherapy.

[19]  Joseph L. Gabbard,et al.  PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species , 2011, Infection and Immunity.

[20]  Fangfang Xia,et al.  In search of genome annotation consistency: solid gene clusters and how to use them , 2013, 3 Biotech.

[21]  Maulik Shukla,et al.  Curation, integration and visualization of bacterial virulence factors in PATRIC , 2014, Bioinform..

[22]  James E. DiCarlo,et al.  RNA-Guided Human Genome Engineering via Cas9 , 2013, Science.

[23]  Mosè Rossi,et al.  Translational recoding in archaea , 2012, Extremophiles.

[24]  Rick L. Stevens,et al.  High-throughput generation, optimization and analysis of genome-scale metabolic models , 2010, Nature Biotechnology.

[25]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[26]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[27]  Feng Xu,et al.  Therapeutic target database update 2014: a resource for targeted therapeutics , 2013, Nucleic Acids Res..

[28]  Yan Zhang,et al.  PATRIC, the bacterial bioinformatics database and analysis resource , 2013, Nucleic Acids Res..

[29]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[30]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[31]  R. Overbeek,et al.  FIGfams: yet another set of protein families , 2009, Nucleic acids research.

[32]  Georgios S. Vernikos,et al.  Identification, variation and transcription of pneumococcal repeat sequences , 2011, BMC Genomics.

[33]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[34]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[35]  Inna Dubchak,et al.  The integrated microbial genomes (IMG) system , 2005, Nucleic Acids Res..

[36]  Jian Yang,et al.  VFDB 2012 update: toward the genetic diversity and molecular evolution of bacterial virulence factors , 2011, Nucleic Acids Res..

[37]  Robert Olson,et al.  Real Time Metagenomics: Using k-mers to annotate metagenomes , 2012, Bioinform..

[38]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[39]  Erika Check Hayden,et al.  Technology: The $1,000 genome , 2014, Nature.