The BSGatlas: An enhanced annotation of genes and transcripts for the Bacillus subtilis genome with improved information access

The genome of Bacillus subtilis continues to provide exiting genomic insights. However, the growing collective genomic knowledge about this micro-organism is spread across multiple annotation resources. Thus, the full annotation is not directly accessible neither for specific genes nor for large-scale high-throughput analyses. Furthermore, access to annotation of non-coding RNA genes (ncRNAs) and polycistronic mRNAs is difficult. To address these challenges we introduce the Bacillus subtilis genome atlas, BSGatlas, in which we integrate and unify multiple existing annotation resources. Our integration provides twice as many ncRNAs than the individual resources, improves the positional annotation for 70% of the combined ncRNAs, and makes it possible to infer specific ncRNA types. Moreover, we unify known transcription start sites, termination, and transcriptional units (TUs) as a comprehensive transcript map. This transcript map implies 815 new TUs and 6, 164 untranslated regions (UTRs), which is a five-fold increase over existing resources. We furthermore, find 2, 309 operons covering the transcriptional annotation for 93% of all genes, corresponding to an improvement by 11%. The BSGatlas is available in multiple formats. A user can either download the entire annotation in the standardized GFF3 format, which is compatible with most bioinformatics tools for omics and high-throughput studies, or view the annotation in an online browser at http://rth.dk/resources/bsgatlas. Importance The Bacillus subtilis genome has been studied in numerous context and consequently multiple efforts have been made in providing a complete annotation. Unfortunately, a number of resources are no longer maintained, and (i) the collective annotation knowledge is dispersed over multiple resources, of which each has a different focus of what type of annotation information they provide. (ii) Thus, it is difficult to easily and at a large scale obtain information for a genomic region or genes of interest. (iii) Furthermore, all resources are essentially incomplete when it comes to annotating non-coding and structured RNA, and transcripts in general. Here, we address all three problems by first collecting existing annotations of genes and transcripts start and termination sites; afterwards resolving discrepancies in annotations and combining them, which doubled the number of ncRNAs; inferring full transcripts and 2,309 operons from the combined knowledge of known transcript boundaries and meta-information; and critically providing it all in a standardized UCSC browser. That interface and its powerful set of functionalities allow users to access all the information in a single resource as well as enables them to include own data on top the full annotation.

[1]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[2]  Peter F. Stadler,et al.  tRNAdb 2009: compilation of tRNA sequences and tRNA genes , 2008, Nucleic Acids Res..

[3]  Barry L. Wanner,et al.  Unprecedented High-Resolution View of Bacterial Operon Architecture Revealed by RNA Sequencing , 2014, mBio.

[4]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[5]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[6]  F. Michel,et al.  Group II self-splicing introns in bacteria , 1993, Nature.

[7]  Robert Gentleman,et al.  rtracklayer: an R package for interfacing with genome browsers , 2009, Bioinform..

[8]  M. Meyer,et al.  The Transcriptional landscape of Streptococcus pneumoniae TIGR4 reveals a complex operon architecture and abundant riboregulation critical for growth and virulence , 2018, PLoS pathogens.

[9]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..

[10]  Ajay Singh,et al.  Developments in the use of Bacillus species for industrial production. , 2004, Canadian journal of microbiology.

[11]  L. Steinmetz,et al.  Gene regulation by antisense transcription , 2013, Nature Reviews Genetics.

[12]  J. Vogel,et al.  Identification of regulatory RNAs in Bacillus subtilis , 2010, Nucleic acids research.

[13]  Zasha Weinberg,et al.  Detection of 224 candidate structured RNAs by comparative analysis of specific subsets of intergenic regions , 2017, Nucleic acids research.

[14]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[15]  A. Danchin,et al.  From a consortium sequence to a unified sequence: the Bacillus subtilis 168 reference genome a decade later , 2009, Microbiology.

[16]  Astrid Gall,et al.  Ensembl 2018 , 2017, Nucleic Acids Res..

[17]  Jörg Stülke,et al.  SubtiWiki in 2018: from genes and proteins to functional network annotation of the model organism Bacillus subtilis , 2017, Nucleic Acids Res..

[18]  R. Landick,et al.  Mechanisms of Bacterial Transcription Termination: All Good Things Must End. , 2016, Annual review of biochemistry.

[19]  Pascale Cossart,et al.  Identification of new noncoding RNAs in Listeria monocytogenes and prediction of mRNA targets , 2007, Nucleic acids research.

[20]  C. Gualerzi,et al.  The cspA mRNA is a thermosensor that modulates translation of the cold-shock protein CspA. , 2010, Molecular cell.

[21]  S. Rasmussen,et al.  The transcriptionally active regions in the genome of Bacillus subtilis , 2009, Molecular microbiology.

[22]  T. Cech Self-splicing of group I introns. , 1990, Annual review of biochemistry.

[23]  Ting Wang,et al.  Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser , 2013, Bioinform..

[24]  F. Jacob,et al.  L'opéron : groupe de gènes à expression coordonnée par un opérateur [C. R. Acad. Sci. Paris 250 (1960) 1727–1729] , 2005 .

[25]  N. Thomson,et al.  Studying bacterial transcriptomes using RNA-seq , 2010, Current opinion in microbiology.

[26]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[27]  Jan Gorodkin,et al.  Structured RNAs and synteny regions in the pig genome , 2014, BMC Genomics.

[28]  Antoine Danchin,et al.  Bacillus subtilis, the model Gram‐positive bacterium: 20 years of annotation refinement , 2017, Microbial biotechnology.

[29]  O. Kuipers,et al.  Identification of Differentially Expressed Genes during Bacillus subtilis Spore Outgrowth in High-Salinity Environments Using RNA Sequencing , 2016, Front. Microbiol..

[30]  S. Brantl,et al.  BsrG/SR4 from Bacillus subtilis– the first temperature‐dependent type I toxin–antitoxin system , 2012, Molecular microbiology.

[31]  Konrad U. Förstner,et al.  ANNOgesic: a Swiss army knife for the RNA-seq based annotation of bacterial/archaeal genomes , 2018, GigaScience.

[32]  E. Nudler,et al.  The riboswitch control of bacterial metabolism. , 2004, Trends in biochemical sciences.

[33]  Antje Chang,et al.  BRENDA in 2019: a European ELIXIR core data resource , 2018, Nucleic Acids Res..

[34]  Jeffrey W. Roberts Mechanisms of Bacterial Transcription Termination. , 2019, Journal of molecular biology.

[35]  Christian Zwieb,et al.  The tmRDB and SRPDB resources , 2005, Nucleic Acids Res..

[36]  Gene-Wei Li,et al.  Maturation of polycistronic mRNAs by the endoribonuclease RNase Y and its associated Y-complex in Bacillus subtilis , 2018, Proceedings of the National Academy of Sciences.

[37]  Bryan Kolaczkowski,et al.  Functional Annotations of Paralogs: A Blessing and a Curse , 2016, Life.

[38]  G. Storz,et al.  Regulatory RNAs in Bacteria , 2009, Cell.

[39]  Andrew M. Smith,et al.  The UCSC Archaeal Genome Browser: 2012 update , 2011, Nucleic Acids Res..

[40]  I-Min A. Chen,et al.  IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes , 2018, Nucleic Acids Res..

[41]  Gene-Wei Li,et al.  Evolutionary Convergence of Pathway-Specific Enzyme Expression Stoichiometry , 2018, Cell.

[42]  Pascale Cossart,et al.  Term-seq reveals abundant ribo-regulation of antibiotics resistance in bacteria , 2016, Science.

[43]  Kenta Nakai,et al.  DBTBS: a database of transcriptional regulation in Bacillus subtilis containing upstream intergenic conservation information , 2007, Nucleic Acids Res..

[44]  David C. Norris,et al.  Integrated genome browser: visual analytics platform for genomics , 2015, bioRxiv.

[45]  R. Breaker,et al.  Large Noncoding RNAs in Bacteria , 2018, Microbiology spectrum.

[46]  Yan Zhang,et al.  PATRIC, the bacterial bioinformatics database and analysis resource , 2013, Nucleic Acids Res..

[47]  Kelly P. Williams,et al.  The tmRNA website , 2014, Nucleic Acids Res..

[48]  W. Winkler,et al.  Genetic control by cis-acting regulatory RNAs in Bacillus subtilis: general principles and prospects for discovery. , 2006, Cold Spring Harbor symposia on quantitative biology.

[49]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[50]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[51]  Dandan Huang,et al.  BSRD: a repository for bacterial small regulatory RNA , 2012, Nucleic Acids Res..

[52]  A. Shen,et al.  Diverse mechanisms regulate sporulation sigma factor activity in the Firmicutes. , 2015, Current opinion in microbiology.

[53]  B. Haas,et al.  How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? , 2012, BMC Genomics.

[54]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[55]  V. Ramakrishnan,et al.  Ribosome Structure and the Mechanism of Translation , 2002, Cell.

[56]  K. Keiler Biology of trans-translation. , 2008, Annual review of microbiology.

[57]  B. Schwikowski,et al.  Condition-Dependent Transcriptome Reveals High-Level Regulatory Architecture in Bacillus subtilis , 2012, Science.

[58]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.