Phylogeny analysis of whole protein-coding genes in metagenomic data detected an environmental gradient for the microbiota

Environmental factors affect the growth of microorganisms and therefore alter the composition of microbiota. Correlative analysis of the relationship between metagenomic composition and the environmental gradient can help elucidate key environmental factors and establishment principles for microbial communities. However, a reasonable method to quantitatively compare whole metagenomic data and identify the primary environmental factors for the establishment of microbiota has not been reported so far. In this study, we developed a method to compare whole proteomes deduced from metagenomic shotgun sequencing data, and quantitatively display their phylogenetic relationships as metagenomic trees. We called this method Metagenomic Phylogeny by Average Sequence Similarity (MPASS). We also compared one of the metagenomic trees with dendrograms of environmental factors using a comparison tool for phylogenetic trees. The MPASS method correctly constructed metagenomic trees of simulated metagenomes and soil and water samples. The topology of the metagenomic tree of samples from the Kirishima hot springs area in Japan was highly similarity to that of the dendrograms based on previously reported environmental factors for this area. The topology of the metagenomic tree also reflected the dynamics of microbiota at the taxonomic and functional levels. Our results strongly suggest that MPASS can successfully classify metagenomic shotgun sequencing data based on the similarity of whole protein-coding sequences, and will be useful for the identification of principal environmental factors for the establishment of microbial communities.

[1]  Luis Pedro Coelho,et al.  Towards the biogeography of prokaryotic genes , 2021, Nature.

[2]  Tetsuya Hayashi,et al.  MetaPlatanus: a metagenome assembler that combines long-range sequence links and species-specific features , 2021, Nucleic acids research.

[3]  Vladimir B. Bajic,et al.  KAUST Metagenomic Analysis Platform (KMAP), enabling access to massive analytics of re-annotated metagenomic data , 2020, Scientific Reports.

[4]  Minoru Kanehisa,et al.  KEGG: integrating viruses and cellular organisms , 2020, Nucleic Acids Res..

[5]  Matteo Comin,et al.  Comparison of microbiome samples: methods and computational challenges , 2020, Briefings Bioinform..

[6]  Edoardo Pasolli,et al.  Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0 , 2020, Nature Communications.

[7]  Jie Ren,et al.  Reads Binning Improves Alignment-Free Metagenome Comparison , 2019, Front. Genet..

[8]  K. Kurokawa,et al.  The Relationship Between Microbial Community Structures and Environmental Parameters Revealed by Metagenomic Analysis of Hot Spring Water in the Kirishima Area, Japan , 2018, Front. Bioeng. Biotechnol..

[9]  H. Nykänen,et al.  Gammaproteobacterial methanotrophs dominate methanotrophy in aerobic and anaerobic layers of boreal lake waters , 2018, Aquatic Microbial Ecology.

[10]  Susumu Goto,et al.  MAPLE 2.3.0: an improved system for evaluating the functionomes of genomes and metagenomes , 2018, Bioscience, biotechnology, and biochemistry.

[11]  M. Yokono,et al.  Comparative analyses of whole-genome protein sequences from multiple organisms , 2018, Scientific Reports.

[12]  Hongbin Liu,et al.  Metagenomic Insights Into the Microbial Community and Nutrient Cycling in the Western Subarctic Pacific Ocean , 2018, Front. Microbiol..

[13]  N. Jiao,et al.  Cultivation-Independent and Cultivation-Dependent Analysis of Microbes in the Shallow-Sea Hydrothermal System Off Kueishantao Island, Taiwan: Unmasking Heterotrophic Bacterial Diversity and Functional Capacity , 2018, Front. Microbiol..

[14]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[15]  P. Pevzner,et al.  metaSPAdes: a new versatile metagenomic assembler. , 2017, Genome research.

[16]  Vineet K. Sharma,et al.  Metagenomic Analysis of Hot Springs in Central India Reveals Hydrocarbon Degrading Thermophiles and Pathways Essential for Survival in Extreme Environments , 2017, Front. Microbiol..

[17]  Daniel H. Huson,et al.  MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data , 2016, PLoS Comput. Biol..

[18]  A. Sanchez-Flores,et al.  The Road to Metagenomics: From Microbiology to DNA Sequencing Technologies and Bioinformatics , 2015, Front. Genet..

[19]  Christopher R. German,et al.  Pathways for abiotic organic synthesis at submarine hydrothermal fields , 2015, Proceedings of the National Academy of Sciences.

[20]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[21]  Kai Song,et al.  New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing , 2014, Briefings Bioinform..

[22]  K. Konstantinidis,et al.  Strengths and Limitations of 16S rRNA Gene Amplicon Sequencing in Revealing Temporal Microbial Community Dynamics , 2014, PloS one.

[23]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[24]  T. Sharpton An introduction to the analysis of shotgun metagenomic data , 2014, Front. Plant Sci..

[25]  Chaochun Wei,et al.  NeSSM: A Next-Generation Sequencing Simulator for Metagenomics , 2013, PloS one.

[26]  Sharmila S Mande,et al.  Community-analyzer: a platform for visualizing and comparing microbial community structure across microbiomes. , 2013, Genomics.

[27]  M. Mimuro,et al.  Construction of a Phylogenetic Tree of Photosynthetic Prokaryotes Based on Average Similarities of Whole Genome Sequences , 2013, PloS one.

[28]  Minghua Deng,et al.  Comparison of metagenomic samples using sequence signatures , 2012, BMC Genomics.

[29]  Scott T. Bates,et al.  Cross-biome metagenomic analyses of soil microbial communities and their functional attributes , 2012, Proceedings of the National Academy of Sciences.

[30]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[31]  P. Pérez-Rodríguez,et al.  Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes , 2012, BMC Research Notes.

[32]  P. Bork,et al.  A Holistic Approach to Marine Eco-Systems Biology , 2011, PLoS biology.

[33]  R. Knight,et al.  Soil bacterial and fungal communities across a pH gradient in an arable soil , 2010, The ISME Journal.

[34]  J. Gilbert,et al.  Comparison of multiple metagenomes using phylogenetic networks based on ecological indices , 2010, The ISME Journal.

[35]  M. Borodovsky,et al.  Ab initio gene identification in metagenomic sequences , 2010, Nucleic acids research.

[36]  D. Willner,et al.  Metagenomic signatures of 86 microbial and viral metagenomes. , 2009, Environmental microbiology.

[37]  Alison E. Murray,et al.  Metagenome analysis of an extreme microbial symbiosis reveals eurythermal adaptation and metabolic flexibility , 2008, Proceedings of the National Academy of Sciences.

[38]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[39]  Masatoshi Nei,et al.  Evolutionary Distance: Estimation , 2006 .

[40]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[41]  B. Snel,et al.  Genome trees and the nature of genome evolution. , 2005, Annual review of microbiology.

[42]  Samuel Karlin,et al.  Protein length in eukaryotic and prokaryotic proteomes , 2005, Nucleic acids research.

[43]  Daniel H. Huson,et al.  Whole-genome prokaryotic phylogeny , 2005, Bioinform..

[44]  R. Huber,et al.  New isolates and physiological properties of the Aquificales and description of Thermocrinis albus sp. nov. , 2002, Extremophiles.

[45]  S Karlin,et al.  Compositional biases of bacterial genomes and evolutionary implications , 1997, Journal of bacteriology.

[46]  M. Chuan,et al.  Solubility of heavy metals in a contaminated soil: Effects of redox potential and pH , 1996 .

[47]  A. Uitterlinden,et al.  Profiling of complex microbial populations by denaturing gradient gel electrophoresis analysis of polymerase chain reaction-amplified genes coding for 16S rRNA , 1993, Applied and environmental microbiology.

[48]  G. Olsen,et al.  A phylogenetic analysis of Aquifex pyrophilus. , 1992, Systematic and applied microbiology.

[49]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[50]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[51]  W. Zillig,et al.  The Sulfolobus-“Caldariella” group: Taxonomy on the basis of the structure of DNA-dependent RNA polymerases , 1980, Archives of Microbiology.

[52]  J. Gilbert,et al.  Microbial metagenomics: beyond the genome. , 2011, Annual review of marine science.

[53]  T. D. Brock,et al.  Sulfolobus: A new genus of sulfur-oxidizing bacteria living at low pH and high temperature , 2004, Archiv für Mikrobiologie.

[54]  phylogenetic and functional analysis of metagenomes , 2022 .