ANNOgesic: a Swiss army knife for the RNA-seq based annotation of bacterial/archaeal genomes

Abstract To understand the gene regulation of an organism of interest, a comprehensive genome annotation is essential. While some features, such as coding sequences, can be computationally predicted with high accuracy based purely on the genomic sequence, others, such as promoter elements or noncoding RNAs, are harder to detect. RNA sequencing (RNA-seq) has proven to be an efficient method to identify these genomic features and to improve genome annotations. However, processing and integrating RNA-seq data in order to generate high-resolution annotations is challenging, time consuming, and requires numerous steps. We have constructed a powerful and modular tool called ANNOgesic that provides the required analyses and simplifies RNA-seq-based bacterial and archaeal genome annotation. It can integrate data from conventional RNA-seq and differential RNA-seq and predicts and annotates numerous features, including small noncoding RNAs, with high precision. The software is available under an open source license (ISCL) at https://pypi.org/project/ANNOgesic/.

[1]  Rolf Backofen,et al.  IntaRNA 2.0: enhanced and customizable prediction of RNA–RNA interactions , 2017, Nucleic Acids Res..

[2]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[3]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[4]  Pascale Cossart,et al.  Term-seq reveals abundant ribo-regulation of antibiotics resistance in bacteria , 2016, Science.

[5]  Julio Collado-Vides,et al.  RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more , 2012, Nucleic Acids Res..

[6]  A. Beauregard,et al.  Identification and characterization of small RNAs in Yersinia pestis , 2013, RNA biology.

[7]  Konrad U. Förstner,et al.  Grad-seq guides the discovery of ProQ as a major small RNA-binding protein , 2016, Proceedings of the National Academy of Sciences.

[8]  Y. Wolf,et al.  Small proteins can no longer be ignored. , 2014, Annual review of biochemistry.

[9]  Peter D. Karp,et al.  EcoCyc: a comprehensive database of Escherichia coli biology , 2010, Nucleic Acids Res..

[10]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[11]  Peter F. Stadler,et al.  TSSAR: TSS annotation regime for dRNA-seq data , 2014, BMC Bioinformatics.

[12]  Konrad U. Förstner,et al.  Effect of Shear Stress on Pseudomonas aeruginosa Isolated from the Cystic Fibrosis Lung , 2016, mBio.

[13]  Kay Nieselt,et al.  High-Resolution Transcriptome Maps Reveal Strain-Specific Regulatory Features of Multiple Campylobacter jejuni Isolates , 2013, PLoS genetics.

[14]  Sam Forster,et al.  RNA-eXpress annotates novel transcript features in RNA-seq data , 2013, Bioinform..

[15]  Frank Stahl,et al.  Transcriptome analysis using next-generation sequencing. , 2013, Current opinion in biotechnology.

[16]  Byoung-Tak Zhang,et al.  PIE: an online prediction system for protein–protein interactions from text , 2008, Nucleic Acids Res..

[17]  J. Vogel,et al.  An atlas of Hfq‐bound transcripts reveals 3′ UTRs as a genomic reservoir of regulatory small RNAs , 2012, The EMBO journal.

[18]  Thomas Rattei,et al.  ConsPred: a rule-based (re-)annotation framework for prokaryotic genomes , 2016, Bioinform..

[19]  Peter F. Stadler,et al.  Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures , 2009, PLoS Comput. Biol..

[20]  Konrad U. Förstner,et al.  READemption – A tool for the computational analysis of deep-sequencing-based transcriptome data , 2014, bioRxiv.

[21]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[22]  David B Goldstein,et al.  Screening the human exome: a comparison of whole genome and whole transcriptome sequencing , 2010, Genome Biology.

[23]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[24]  A. Hochschild Gene-Specific Regulation by a Transcript Cleavage Factor: Facilitating Promoter Escape , 2007, Journal of bacteriology.

[25]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences: current status, policy and new initiatives , 2008, Nucleic Acids Res..

[26]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[27]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[28]  T. D. Schneider,et al.  Small membrane proteins found by comparative genomics and ribosome binding site models , 2008, Molecular microbiology.

[29]  B. Simmons,et al.  A single-base resolution map of an archaeal transcriptome. , 2010, Genome research.

[30]  Ibtissem Grissa,et al.  The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats , 2007, BMC Bioinformatics.

[31]  Kristin Reiche,et al.  The primary transcriptome of the major human pathogen Helicobacter pylori , 2010, Nature.

[32]  Nicholas T. Ingolia Ribosome profiling: new views of translation, from single codons to genome scale , 2014, Nature Reviews Genetics.

[33]  Konrad U. Förstner,et al.  ANNOgesic: A Swiss army knife for the RNA-Seq based annotation of bacterial/archaeal genomes , 2018 .

[34]  D. Gautheret,et al.  Experimental discovery of small RNAs in Staphylococcus aureus reveals a riboregulator of central metabolism , 2010, Nucleic acids research.

[35]  Peter Schattner,et al.  The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs , 2005, Nucleic Acids Res..

[36]  G. Storz,et al.  Regulatory RNAs in Bacteria , 2009, Cell.

[37]  Fabio Rinaldi,et al.  RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond , 2015, Nucleic Acids Res..

[38]  D. E. Goldberg,et al.  Genetic Algorithms in Search, Optimization & Machine Learning , 1989 .

[39]  Kay Nieselt,et al.  Global Transcriptional Start Site Mapping Using Differential RNA Sequencing Reveals Novel Antisense RNAs in Escherichia coli , 2014, Journal of bacteriology.

[40]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[41]  Andrea Tanzer,et al.  A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection , 2014, Genome Biology.

[42]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[43]  Peter F. Hallin,et al.  RNAmmer: consistent and rapid annotation of ribosomal RNA genes , 2007, Nucleic acids research.

[44]  B. Tjaden,et al.  Computational analysis of bacterial RNA-Seq data , 2013, Nucleic acids research.

[45]  Peter F. Stadler,et al.  Thermodynamics of RNA-RNA Binding , 2006, German Conference on Bioinformatics.

[46]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[47]  S. Salzberg,et al.  Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake , 2007, Genome Biology.

[48]  Martin Ester,et al.  PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes , 2010, Bioinform..

[49]  Xin Chen,et al.  DOOR 2.0: presenting operons and their functions through dynamic and integrated views , 2013, Nucleic Acids Res..

[50]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[51]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[52]  Jing Wang,et al.  Identification of bacterial sRNA regulatory targets using ribosome profiling , 2015, Nucleic acids research.

[53]  Dandan Huang,et al.  BSRD: a repository for bacterial small regulatory RNA , 2012, Nucleic Acids Res..

[54]  Araceli M. Huerta,et al.  Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli , 2009, PloS one.

[55]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[56]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[57]  J. Vogel,et al.  Small RNA binding to 5' mRNA coding region inhibits translational initiation. , 2008, Molecular cell.

[58]  Thomas D. Otto,et al.  RATT: Rapid Annotation Transfer Tool , 2011, Nucleic acids research.

[59]  Rolf Backofen,et al.  Global RNA recognition patterns of post‐transcriptional regulators Hfq and CsrA revealed by UV crosslinking in vivo , 2016, The EMBO journal.

[60]  K. Nieselt,et al.  Differential RNA-seq (dRNA-seq) for annotation of transcriptional start sites and small RNAs in Helicobacter pylori. , 2015, Methods.

[61]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[62]  Jörg Vogel,et al.  Differential RNA-seq: the approach behind and the biological insight gained. , 2014, Current opinion in microbiology.

[63]  Hakim Tafer,et al.  RNAplex: a fast tool for RNA-RNA interaction search , 2008, Bioinform..

[64]  Thomas Schiex,et al.  EuGene-PP: a next-generation automated annotation pipeline for prokaryotic genomes , 2014, Bioinform..

[65]  J. Vogel,et al.  Regulatory small RNAs from the 3' regions of bacterial mRNAs. , 2015, Current opinion in microbiology.

[66]  Nikos Kyrpides,et al.  CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats , 2007, BMC Bioinformatics.

[67]  Martin C. Frith,et al.  Discovering Sequence Motifs with Arbitrary Insertions and Deletions , 2008, PLoS Comput. Biol..

[68]  K. Zhao,et al.  Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq , 2009, Nucleic acids research.

[69]  Jeffry D. Sander,et al.  CRISPR-Cas systems for editing, regulating and targeting genomes , 2014, Nature Biotechnology.

[70]  G. Klug,et al.  An RpoHI-Dependent Response Promotes Outgrowth after Extended Stationary Phase in the Alphaproteobacterium Rhodobacter sphaeroides , 2017, Journal of bacteriology.

[71]  Mihaela Zavolan,et al.  TSSer: an automated method to identify transcription start sites in prokaryotic genomes from differential RNA sequencing data , 2014, Bioinform..