GeM-Pro: a tool for genome functional mining and microbial profiling

Gem-Pro is a new tool for gene mining and functional profiling of bacteria. It initially identifies homologous genes using BLAST and then applies three filtering steps to select orthologous gene pairs. The first one uses BLAST score values to identify trivial paralogs. The second filter uses the shared identity percentages of found trivial paralogs as internal witnesses of non-orthology to set orthology cutoff values. The third filtering step uses conditional probabilities of orthology and non-orthology to define new cutoffs and generate supportive information of orthology assignations. Additionally, a subsidiary tool, called q-GeM, was also developed to mine traits of interest using logistic regression (LR) or linear discriminant analysis (LDA) classifiers. q-GeM is more efficient in the use of computing resources than Gem-Pro but needs an initial classified set of homologous genes in order to train LR and LDA classifiers. Hence, q-GeM could be used to analyze new set of strains with available genome sequences, without the need to rerun a complete Gem-Pro analysis. Finally, Gem-Pro and q-GeM perform a synteny analysis to evaluate the integrity and genomic arrangement of specific pathways of interest to infer their presence. The tools were applied to more than 2 million homologous pairs encoded by Bacillus strains generating statistical supported predictions of trait contents. The different patterns of encoded traits of interest were successfully used to perform a descriptive bacterial profiling.

[1]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[2]  J. Chun,et al.  OrthoANI: An improved algorithm and software for calculating average nucleotide identity. , 2016, International journal of systematic and evolutionary microbiology.

[3]  Peer Bork,et al.  Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy , 2011, Nucleic Acids Res..

[4]  Mark Gerstein,et al.  Getting Started in Gene Orthology and Functional Analysis , 2010, PLoS Comput. Biol..

[5]  R. Borriss,et al.  Bacillomycin D Produced by Bacillus amyloliquefaciens Is Involved in the Antagonistic Interaction with the Plant-Pathogenic Fungus Fusarium graminearum , 2017, Applied and Environmental Microbiology.

[6]  R. Borriss,et al.  Bacilysin overproduction in Bacillus amyloliquefaciens FZB42 markerless derivative strains FZBREP and FZBSPA enhances antibacterial activity , 2014, Applied Microbiology and Biotechnology.

[7]  Christophe Dessimoz,et al.  Inferring Orthology and Paralogy. , 2019, Methods in molecular biology.

[8]  Marek L Borowiec,et al.  AMAS: a fast tool for alignment manipulation and computing of summary statistics , 2016, PeerJ.

[9]  R. Borriss,et al.  More than Anticipated – Production of Antibiotics and Other Secondary Metabolites by Bacillus amyloliquefaciens FZB42 , 2008, Journal of Molecular Microbiology and Biotechnology.

[10]  R. Borriss,et al.  Bacilysin from Bacillus amyloliquefaciens FZB42 Has Specific Bactericidal Activity against Harmful Algal Bloom Species , 2014, Applied and Environmental Microbiology.

[11]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[12]  Arvind K. Chavali,et al.  Bioinformatics tools for the identification of gene clusters that biosynthesize specialized metabolites , 2018, Briefings Bioinform..

[13]  Arcady R. Mushegian,et al.  Computational methods for Gene Orthology inference , 2011, Briefings Bioinform..

[14]  G. Aleti,et al.  Genome mining: Prediction of lipopeptides and polyketides from Bacillus and related Firmicutes , 2015, Computational and structural biotechnology journal.

[15]  W. Uddin,et al.  Induced systemic resistance responses in perennial ryegrass against Magnaporthe oryzae elicited by semi-purified surfactin lipopeptides and live cells of Bacillus amyloliquefaciens. , 2015, Molecular plant pathology.

[16]  Evgeny M. Zdobnov,et al.  OrthoDB: the hierarchical catalog of eukaryotic orthologs , 2007, Nucleic Acids Res..

[17]  G. Braus,et al.  One Juliet and four Romeos: VeA and its methyltransferases , 2015, Front. Microbiol..

[18]  R. Borriss,et al.  Difficidin and bacilysin from Bacillus amyloliquefaciens FZB42 have antibacterial activity against Xanthomonas oryzae rice pathogens , 2015, Scientific Reports.

[19]  M. Farag,et al.  Bacterial volatiles promote growth in Arabidopsis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[21]  M. Robinson‐Rechavi,et al.  How confident can we be that orthologs are similar, but paralogs differ? , 2009, Trends in genetics : TIG.

[22]  D. Albrecht,et al.  Influence of root exudates on the extracellular proteome of the plant growth-promoting bacterium Bacillus amyloliquefaciens FZB42. , 2015, Microbiology.

[23]  Keqin Zhang,et al.  The highly modified microcin peptide plantazolicin is associated with nematicidal activity of Bacillus amyloliquefaciens FZB42 , 2013, Applied Microbiology and Biotechnology.

[24]  Christophe Dessimoz,et al.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods , 2009, PLoS Comput. Biol..

[25]  S. Chowdhury,et al.  Biocontrol mechanism by root-associated Bacillus amyloliquefaciens FZB42 – a review , 2015, Front. Microbiol..

[26]  H. Kasai,et al.  Frondibacter mangrovi sp. nov., a member of the family Flavobacteriaceae isolated from seawater by in situ cultivation, and emended description of Frondibacter aureus. , 2017, International journal of systematic and evolutionary microbiology.

[27]  M. Rateb,et al.  Comparative Genomics of Bacillus amyloliquefaciens Strains Reveals a Core Genome with Traits for Habitat Adaptation and a Secondary Metabolites Rich Accessory Genome , 2017, Front. Microbiol..

[28]  R. Borriss,et al.  Genome analysis of Bacillus amyloliquefaciens FZB42 reveals its potential for biocontrol of plant pathogens. , 2009, Journal of biotechnology.

[29]  S. Kunz,et al.  Difficidin and bacilysin produced by plant-associated Bacillus amyloliquefaciens are efficient in controlling fire blight disease. , 2009, Journal of biotechnology.

[30]  Complete Genome Sequence of Bacillus amyloliquefaciens subsp. plantarum CC178, a Phyllosphere Bacterium Antagonistic to Plant Pathogenic Fungi , 2015, Genome Announcements.

[31]  Arthur M. Lesk,et al.  Quantitative sequence-function relationships in proteins based on gene ontology , 2007, BMC Bioinformatics.

[32]  Mark Kirkpatrick,et al.  Chromosomal Speciation in the Genomics Era: Disentangling Phylogenetic Evolution of Rock-wallabies , 2017, Front. Genet..

[33]  L. Koski,et al.  The Closest BLAST Hit Is Often Not the Nearest Neighbor , 2001, Journal of Molecular Evolution.

[34]  Junfang Lin,et al.  Genomics-guided discovery and structure identification of cyclic lipopeptides from the Bacillus siamensis JFL15 , 2018, PloS one.

[35]  P. Schmitt‐Kopplin,et al.  Cyclic Lipopeptides of Bacillus amyloliquefaciens subsp. plantarum Colonizing the Lettuce Rhizosphere Enhance Plant Defense Responses Toward the Bottom Rot Pathogen Rhizoctonia solani. , 2015, Molecular plant-microbe interactions : MPMI.

[36]  R. Jensen Orthologs and paralogs - we need to get it right , 2001, Genome Biology.

[37]  Damian Szklarczyk,et al.  eggNOG v4.0: nested orthology inference across 3686 organisms , 2013, Nucleic Acids Res..

[38]  Douglas A. Mitchell,et al.  Plantazolicin, a Novel Microcin B17/Streptolysin S-Like Natural Product from Bacillus amyloliquefaciens FZB42 , 2010, Journal of bacteriology.

[39]  C. Orengo,et al.  Protein Superfamily Evolution and the Last Universal Common Ancestor (LUCA) , 2006, Journal of Molecular Evolution.

[40]  Mark Johnson,et al.  NCBI BLAST: a better web interface , 2008, Nucleic Acids Res..

[41]  Y. Tateno,et al.  Ortholog-Finder: A Tool for Constructing an Ortholog Data Set , 2016, Genome biology and evolution.

[42]  Kristoffer Forslund,et al.  The relationship between orthology, protein domain architecture and protein function , 2011 .

[43]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[44]  H. Klenk,et al.  Bacillus amyloliquefaciens, Bacillus velezensis, and Bacillus siamensis Form an “Operational Group B. amyloliquefaciens” within the B. subtilis Species Complex , 2017, Front. Microbiol..

[45]  Gerard Talavera,et al.  Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. , 2007, Systematic biology.

[46]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[47]  Ian T. Paulsen,et al.  Sequences of Two Related Multiple Antibiotic Resistance Virulence Plasmids Sharing a Unique IS26-Related Molecular Signature Isolated from Different Escherichia coli Pathotypes from Different Hosts , 2013, PloS one.

[48]  Jeroniza Nunes Marchaukoski,et al.  New Tools in Orthology Analysis: A Brief Review of Promising Perspectives , 2017, Front. Genet..

[49]  R. Borriss,et al.  Macrolactin is the polyketide biosynthesis product of the pks2 cluster of Bacillus amyloliquefaciens FZB42. , 2007, Journal of natural products.

[50]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[51]  Hidetoshi Shimodaira,et al.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering , 2006, Bioinform..

[52]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[53]  C. Magni,et al.  Taxonomic Identity Resolution of Highly Phylogenetically Related Strains and Selection of Phylogenetic Markers by Using Genome-Scale Methods: The Bacillus pumilus Group Case , 2016, PloS one.

[54]  Ron Korstanje,et al.  WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning , 2016, PLoS Comput. Biol..

[55]  S. Chowdhury,et al.  Effects of Bacillus amyloliquefaciens FZB42 on Lettuce Growth and Health under Pathogen Pressure and Its Impact on the Rhizosphere Bacterial Community , 2013, PLoS ONE.

[56]  Anushya Muruganujan,et al.  PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees , 2012, Nucleic Acids Res..

[57]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[58]  P. Lasch,et al.  Amylocyclicin, a Novel Circular Bacteriocin Produced by Bacillus amyloliquefaciens FZB42 , 2014, Journal of bacteriology.

[59]  H. Liesegang,et al.  Structural and Functional Characterization of Gene Clusters Directing Nonribosomal Synthesis of Bioactive Cyclic Lipopeptides in Bacillus amyloliquefaciens Strain FZB42 , 2004, Journal of bacteriology.

[60]  Mitchell J. Sullivan,et al.  Easyfig: a genome comparison visualizer , 2011, Bioinform..

[61]  Gaston H. Gonnet,et al.  Orthologous Matrix (OMA) algorithm 2.0: more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference , 2017, Bioinform..

[62]  Alex Bateman,et al.  TreeFam v9: a new website, more species and orthology-on-the-fly , 2013, Nucleic Acids Res..

[63]  Olivier Poch,et al.  OrthoInspector: comprehensive orthology analysis and visual exploration , 2011, BMC Bioinformatics.

[64]  Evgeny M. Zdobnov,et al.  OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011 , 2010, Nucleic Acids Res..

[65]  Alexander C. J. Roth,et al.  Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits , 2006, Nucleic acids research.

[66]  J. Chun,et al.  Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies , 2017, International journal of systematic and evolutionary microbiology.

[67]  Leszek P. Pryszcz,et al.  MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score , 2010, Nucleic acids research.

[68]  P. Piccaluga,et al.  Unveiling Another Missing Piece in EBV-Driven Lymphomagenesis: EBV-Encoded MicroRNAs Expression in EBER-Negative Burkitt Lymphoma Cases , 2017, Front. Microbiol..

[69]  S. Pongor,et al.  The quest for orthologs: finding the corresponding gene across genomes. , 2008, Trends in genetics : TIG.

[70]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[71]  B. Morgenstern,et al.  Comparative analysis of the complete genome sequence of the plant growth–promoting bacterium Bacillus amyloliquefaciens FZB42 , 2007, Nature Biotechnology.