G4PromFinder: an algorithm for predicting transcription promoters in GC-rich bacterial genomes based on AT-rich elements and G-quadruplex motifs

BackgroundOver the last few decades, computational genomics has tremendously contributed to decipher biology from genome sequences and related data. Considerable effort has been devoted to the prediction of transcription promoter and terminator sites that represent the essential “punctuation marks” for DNA transcription. Computational prediction of promoters in prokaryotes is a problem whose solution is far from being determined in computational genomics. The majority of published bacterial promoter prediction tools are based on a consensus-sequences search and they were designed specifically for vegetative σ70 promoters and, therefore, not suitable for promoter prediction in bacteria encoding a lot of σ factors, like actinomycetes.ResultsIn this study we investigated the possibility to identify putative promoters in prokaryotes based on evolutionarily conserved motifs, and focused our attention on GC-rich bacteria in which promoter prediction with conventional, consensus-based algorithms is often not-exhaustive. Here, we introduce G4PromFinder, a novel algorithm that predicts putative promoters based on AT-rich elements and G-quadruplex DNA motifs. We tested its performances by using available genomic and transcriptomic data of the model microorganisms Streptomyces coelicolor A3(2) and Pseudomonas aeruginosa PA14. We compared our results with those obtained by three currently available promoter predicting algorithms: the σ70consensus-based PePPER, the σ factors consensus-based bTSSfinder, and PromPredict which is based on double-helix DNA stability. Our results demonstrated that G4PromFinder is more suitable than the three reference tools for both the genomes. In fact our algorithm achieved the higher accuracy (F1-scores 0.61 and 0.53 in the two genomes) as compared to the next best tool that is PromPredict (F1-scores 0.46 and 0.48). Consensus-based algorithms produced lower performances with the analyzed GC-rich genomes.ConclusionsOur analysis shows that G4PromFinder is a powerful tool for promoter search in GC-rich bacteria, especially for bacteria coding for a lot of σ factors, such as the model microorganism S. coelicolor A3(2). Moreover consensus-based tools and, in general, tools that are based on specific features of bacterial σ factors seem to be less performing for promoter prediction in these types of bacterial genomes.

[1]  T. Simonsson,et al.  G-Quadruplex DNA Structures Variations on a Theme , 2001, Biological chemistry.

[2]  Franz Herzog,et al.  Conserved architecture of the core RNA polymerase II initiation complex , 2014, Nature Communications.

[3]  E. Koonin,et al.  Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA-dependent RNA polymerases and the origin of RNA polymerases , 2003, BMC Structural Biology.

[4]  H. Lipps,et al.  G-quadruplexes and their regulatory roles in biology , 2015, Nucleic acids research.

[5]  Ram Krishna Thakur,et al.  Genome-wide computational and expression analyses reveal G-quadruplex DNA motifs as conserved cis-regulatory elements in human and related species. , 2008, Journal of medicinal chemistry.

[6]  J. Hartig,et al.  A matter of location: influence of G-quadruplexes on Escherichia coli gene expression. , 2014, Chemistry & biology.

[7]  Li Li,et al.  Genomic analysis reveals that Pseudomonas aeruginosa virulence is combinatorial , 2006, Genome Biology.

[8]  Shankar Balasubramanian,et al.  G-quadruplexes in promoters throughout the human genome , 2006, Nucleic acids research.

[9]  Manju Bansal,et al.  High-quality annotation of promoter regions for 913 bacterial genomes , 2010, Bioinform..

[10]  John D. Helmann,et al.  Protein family review - The sigma(70) family of sigma factors , 2003 .

[11]  Min Woo Kim,et al.  The dynamic transcriptional and translational landscape of the model antibiotic producer Streptomyces coelicolor A3(2) , 2016, Nature Communications.

[12]  Oliver Stegle,et al.  Predicting and understanding the stability of G-quadruplexes , 2009, Bioinform..

[13]  J. Geiger,et al.  A model for genesis of transcription systems , 2016, Transcription.

[14]  Brian D Sharon,et al.  Bacterial sigma factors: a historical, structural, and genomic perspective. , 2014, Annual review of microbiology.

[15]  Z. Burton,et al.  The σ enigma: Bacterial σ factors, archaeal TFB and eukaryotic TFIIB are homologs , 2014, Transcription.

[16]  Z. Burton The Old and New Testaments of gene regulation , 2014, Transcription.

[17]  M. Wösten Eubacterial sigma-factors. , 1998, FEMS microbiology reviews.

[18]  F. Johnson,et al.  Genomic distribution and functional analyses of potential G-quadruplex-forming sequences in Saccharomyces cerevisiae , 2007, Nucleic acids research.

[19]  Stephen Lory,et al.  The Single-Nucleotide Resolution Transcriptome of Pseudomonas aeruginosa Grown in Body Temperature , 2012, PLoS pathogens.

[20]  G. Dieci,et al.  Investigating transcription reinitiation through in vitro approaches , 2014, Transcription.

[21]  L. Aravind,et al.  Insights from the architecture of the bacterial transcription apparatus. , 2012, Journal of structural biology.

[22]  Mona Singh,et al.  G-Quadruplex DNA Sequences Are Evolutionarily Conserved and Associated with Distinct Genomic Features in Saccharomyces cerevisiae , 2010, PLoS Comput. Biol..

[23]  Manju Bansal,et al.  Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability , 2007, Journal of Biosciences.

[24]  J. Helmann,et al.  The σ70family of sigma factors , 2003, Genome Biology.

[25]  B. Barrell,et al.  Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2) , 2002, Nature.

[26]  Oleg Kikin,et al.  QGRS Mapper: a web-based server for predicting G-quadruplexes in nucleotide sequences , 2006, Nucleic Acids Res..

[27]  V. Solovyev,et al.  Automatic Annotation of Microbial Genomes and Metagenomic Sequences 3 MATERIAL AND METHODS Learning Parameters and Prediction of Protein-Coding Genes , 2013 .

[28]  O. Kuipers,et al.  PePPER: a webserver for prediction of prokaryote promoter elements and regulons , 2012, BMC Genomics.

[29]  Manju Bansal,et al.  Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition. , 2009, Molecular bioSystems.

[30]  Martin G. Reese,et al.  Application of a Time-delay Neural Network to Promoter Annotation in the Drosophila Melanogaster Genome , 2001, Comput. Chem..

[31]  Shankar Balasubramanian,et al.  Prevalence of quadruplexes in the human genome , 2005, Nucleic acids research.

[32]  Vladimir B. Bajic,et al.  bTSSfinder: a novel tool for the prediction of promoters in cyanobacteria and Escherichia coli , 2016, Bioinform..

[33]  A. C. Evans,et al.  Seven essential questions on G-quadruplexes , 2010, Biomolecular concepts.

[34]  O. Doluca,et al.  G-quadruplex prediction in E. coli genome reveals a conserved putative G-quadruplex-Hairpin-Duplex switch , 2016, Nucleic acids research.

[35]  É. Potvin,et al.  Sigma factors in Pseudomonas aeruginosa. , 2008, FEMS microbiology reviews.

[36]  William R. Bourn,et al.  Computer assisted identification and classification of streptomycete promoters. , 1995, Nucleic acids research.

[37]  F. Werner,et al.  Evolution of multisubunit RNA polymerases in the three domains of life , 2011, Nature Reviews Microbiology.

[38]  Sarah W. Burge,et al.  Quadruplex DNA: sequence, topology and structure , 2006, Nucleic acids research.

[39]  A. Lane,et al.  Stability and kinetics of G-quadruplex structures , 2008, Nucleic acids research.

[40]  Mitali Mukerji,et al.  Genome-wide prediction of G4 DNA as regulatory motifs: role in Escherichia coli global regulation. , 2006, Genome research.

[41]  D. Davies,et al.  Helix formation by guanylic acid. , 1962, Proceedings of the National Academy of Sciences of the United States of America.