Toward a high-quality pan-genome landscape of Bacillus subtilis by removal of confounding strains

Pan-genome analysis is widely used to study the evolution and genetic diversity of species, particularly in bacteria. However, the impact of strain selection on the outcome of pan-genome analysis is poorly understood. Furthermore, a standard protocol to ensure high-quality pan-genome results is lacking. In this study, we carried out a series of pan-genome analyses of different strain sets of Bacillus subtilis to understand the impact of various strains on the performance and output quality of pan-genome analyses. Consequently, we found that the results obtained by pan-genome analyses of B. subtilis can be influenced by the inclusion of incorrectly classified Bacillus subspecies strains, phylogenetically distinct strains, engineered genome-reduced strains, chimeric strains, strains with a large number of unique genes or a large proportion of pseudogenes, and multiple clonal strains. Since the presence of these confounding strains can seriously affect the quality and true landscape of the pan-genome, we should remove these deviations in the process of pan-genome analyses. Our study provides new insights into the removal of biases from confounding strains in pan-genome analyses at the beginning of data processing, which enables the achievement of a closer representation of a high-quality pan-genome landscape of B. subtilis that better reflects the performance and credibility of the B. subtilis pan-genome. This procedure could be added as an important quality control step in pan-genome analyses for improving the efficiency of analyses, and ultimately contributing to a better understanding of genome function, evolution and genome-reduction strategies for B. subtilis in the future.

[1]  Mina Rho,et al.  Pan-genome analysis of Bacillus for microbiome profiling , 2017, Scientific Reports.

[2]  Su Inn Park,et al.  Comparative functional pan-genome analyses to build connections between genomic dynamics and phenotypic evolution in polycyclic aromatic hydrocarbon metabolism in the genus Mycobacterium , 2015, BMC Evolutionary Biology.

[3]  Qun Xu,et al.  Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice , 2018, Nature Genetics.

[4]  Andrew J. Page,et al.  Roary: rapid large-scale prokaryote pan genome analysis , 2015, bioRxiv.

[5]  Jun Yu,et al.  PGAP: pan-genomes analysis pipeline , 2011, Bioinform..

[6]  Jun Yu,et al.  PanGP: A tool for quickly analyzing bacterial pan-genome profile , 2014, Bioinform..

[7]  Yan Lin,et al.  DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements , 2013, Nucleic Acids Res..

[8]  Alan McNally,et al.  Why prokaryotes have pangenomes , 2017, Nature Microbiology.

[9]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[10]  Lars Bolund,et al.  Building the sequence map of the human pan-genome , 2010, Nature Biotechnology.

[11]  Chitra Dutta,et al.  BPGA- an ultra-fast pan-genome analysis pipeline , 2016, Scientific Reports.

[12]  G. Sutton,et al.  A novel method of consensus pan-chromosome assembly and large-scale comparative analysis reveal the highly flexible pan-genome of Acinetobacter baumannii , 2015, Genome Biology.

[13]  Tao Chen,et al.  Characterization of genome-reduced Bacillus subtilis strains and their application for the production of guanosine and thymidine , 2016, Microbial Cell Factories.

[14]  P. Higgs,et al.  The advantages and disadvantages of horizontal gene transfer and the emergence of the first species , 2011, Biology Direct.

[15]  Guy Plunkett,et al.  Engineering a reduced Escherichia coli genome. , 2002, Genome research.

[16]  S. Kelly,et al.  OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy , 2015, Genome Biology.

[17]  Jeremy D. DeBarry,et al.  MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity , 2012, Nucleic acids research.

[18]  Uwe Völker,et al.  Large-scale reduction of the Bacillus subtilis genome: consequences for the transcriptional network, resource allocation, and metabolism , 2017, Genome research.

[19]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[20]  R. Lamont,et al.  Large-scale identification of pathogen essential genes during coinfection with sympatric and allopatric microbes , 2019, Proceedings of the National Academy of Sciences.

[21]  H. Tettelin,et al.  The microbial pan-genome. , 2005, Current opinion in genetics & development.

[22]  Erin Beck,et al.  Large-scale comparative analysis of microbial pan-genomes using PanOCT , 2018, Bioinform..

[23]  L. Pritchard,et al.  Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens , 2016 .

[24]  Judith P. Armitage,et al.  Bacterial Locomotion and Signal Transduction , 1998, Journal of bacteriology.

[25]  S. Kanaya,et al.  Enhanced Recombinant Protein Productivity by Genome Reduction in Bacillus subtilis , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[26]  F. Blattner,et al.  Indispensability of Horizontally Transferred Genes and Its Impact on Bacterial Genome Streamlining. , 2016, Molecular biology and evolution.

[27]  Mohammad Alanjary,et al.  Comparative genomics reveals phylogenetic distribution patterns of secondary metabolites in Amycolatopsis species , 2018, BMC Genomics.

[28]  Feng Gao,et al.  Pan-genomic analysis provides novel insights into the association of E.coli with human host and its minimal genome , 2018, Bioinform..

[29]  Yongxiang Zhang,et al.  Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions , 2010, BMC Bioinformatics.

[30]  Intawat Nookaew,et al.  PanViz: interactive visualization of the structure of functionally annotated pangenomes , 2016, Bioinform..

[31]  G. Faulkner,et al.  Overcoming challenges and dogmas to understand the functions of pseudogenes , 2019, Nature Reviews Genetics.

[32]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[33]  J. Adler Chemotaxis in Bacteria , 1966, Science.

[34]  M. P. Francino,et al.  The Ecology of Bacterial Genes and the Survival of the New , 2012, International journal of evolutionary biology.

[35]  David R. Riley,et al.  Comparative genomics: the bacterial pan-genome. , 2008, Current opinion in microbiology.

[36]  Feng Gao,et al.  GC-Profile: a web-based tool for visualizing and analyzing the variation of GC content in genomic sequences , 2006, Nucleic Acids Res..

[37]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[38]  David R. Riley,et al.  Ten years of pan-genome analyses. , 2015, Current opinion in microbiology.

[39]  Davide Heller,et al.  eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences , 2015, Nucleic Acids Res..

[40]  J. Stülke,et al.  The Blueprint of a Minimal Cell: MiniBacillus , 2016, Microbiology and Molecular Reviews.

[41]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[42]  Derrick E. Fouts,et al.  PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species , 2012, Nucleic acids research.

[43]  E. Koonin,et al.  Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world , 2008, Nucleic acids research.

[44]  C. Hutchison,et al.  Minimal Cells-Real and Imagined. , 2017, Cold Spring Harbor perspectives in biology.

[45]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Yuki Moriya,et al.  KAAS: an automatic genome annotation and pathway reconstruction server , 2007, Nucleic Acids Res..

[47]  M. Lercher,et al.  Horizontal gene acquisitions by eukaryotes as drivers of adaptive evolution , 2014, BioEssays : news and reviews in molecular, cellular and developmental biology.

[48]  Carsten Friis,et al.  Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes , 2012, BMC Genomics.

[49]  B. McDonald,et al.  The Accessory Genome as a Cradle for Adaptive Evolution in Pathogens , 2012, PLoS pathogens.

[50]  Kunio Yamane,et al.  Bacillus minimum genome factory: effective utilization of microbial genome information , 2007, Biotechnology and applied biochemistry.

[51]  D. Whitworth,et al.  Genome Sequencing and Pan-Genome Analysis of 23 Corallococcus spp. Strains Reveal Unexpected Diversity, With Particular Plasticity of Predatory Gene Sets , 2018, Front. Microbiol..

[52]  J. Hacker,et al.  Ecological fitness, genomic islands and bacterial pathogenicity , 2001, EMBO reports.

[53]  Peter G. Schultz,et al.  Genomically Recoded Organisms Expand Biological Functions , 2013, Science.

[54]  Nicola Zamboni,et al.  Genome engineering reveals large dispensable regions in Bacillus subtilis. , 2003, Molecular biology and evolution.

[55]  Qing-Yu He,et al.  DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis , 2015, Bioinform..

[56]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[57]  T. Richards,et al.  The Ecology and Evolution of Pangenomes , 2019, Current Biology.

[58]  Diarmaid Hughes,et al.  Gene amplification and adaptive evolution in bacteria. , 2009, Annual review of genetics.

[59]  Marc T. J. Johnson,et al.  Adaptive Evolution in Ecological Communities , 2012, PLoS biology.

[60]  Bernhard O Palsson,et al.  Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance , 2018, Nature Communications.

[61]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..