MGnify: the microbiome analysis resource in 2020

Abstract MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline with multiple analysis pipelines that are tailored according to the input data, and that are formally described using the Common Workflow Language, enabling greater provenance, reusability, and reproducibility. MGnify's new analysis pipelines offer additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer regions (ITS1/2) and expanded protein functional annotations. Biochemical pathways and systems predictions have also been added for assembled contigs. MGnify's growing focus on the assembly of metagenomic data has also seen the number of datasets it has assembled and analysed increase six-fold. The non-redundant protein database constructed from the proteins encoded by these assemblies now exceeds 1 billion sequences. Meanwhile, a newly developed contig viewer provides fine-grained visualisation of the assembled contigs and their enriched annotations.

[1]  Carole Goble,et al.  Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv , 2019, GigaScience.

[2]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[3]  Johannes Söding,et al.  Clustering huge protein sequence sets in linear time , 2017, Nature Communications.

[4]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[5]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[6]  Stinus Lindgreen,et al.  AdapterRemoval: easy cleaning of next-generation sequencing reads , 2012, BMC Research Notes.

[7]  Katherine S. Pollard,et al.  New insights from uncultivated genomes of the global human gut microbiome , 2019, Nature.

[8]  Thomas M. Keane,et al.  The European Nucleotide Archive in 2018 , 2018, Nucleic Acids Res..

[9]  P. Pevzner,et al.  metaSPAdes: a new versatile metagenomic assembler. , 2017, Genome research.

[10]  J. Kostka,et al.  The core seafloor microbiome in the Gulf of Mexico is remarkably consistent and shows evidence of recovery from disturbance caused by major oil spills. , 2019, Environmental microbiology.

[11]  Edoardo Pasolli,et al.  Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle , 2019, Cell.

[12]  Peer Bork,et al.  Microbial abundance, activity and population genomic profiling with mOTUs2 , 2019, Nature Communications.

[13]  Davide Heller,et al.  eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences , 2015, Nucleic Acids Res..

[14]  A. Goodman,et al.  Mapping human microbiome drug metabolism by gut bacteria and their genes , 2019, Nature.

[15]  Michael Y. Galperin,et al.  Expanded microbial genome coverage and improved protein family annotation in the COG database , 2014, Nucleic Acids Res..

[16]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[17]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2010, Nucleic Acids Res..

[18]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[19]  Graziano Pesole,et al.  The metagenomic data life-cycle: standards and best practices , 2017, GigaScience.

[20]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2012, Nucleic Acids Res..

[21]  R. Henrik Nilsson,et al.  The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications , 2018, Nucleic Acids Res..

[22]  Rick L. Stevens,et al.  A communal catalogue reveals Earth’s multiscale microbial diversity , 2017, Nature.

[23]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[24]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[25]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[26]  Robert D. Finn,et al.  A new genomic blueprint of the human gut microbiota , 2019, Nature.

[27]  Hiroyuki Ogata,et al.  KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold , 2019, bioRxiv.

[28]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[29]  Graziano Pesole,et al.  ITSoneDB: a comprehensive collection of eukaryotic ribosomal RNA Internal Transcribed Spacer 1 (ITS1) sequences , 2017, Nucleic Acids Res..

[30]  Mingchao Yu,et al.  Proliferation of hydrocarbon-degrading microbes at the bottom of the Mariana Trench , 2019, Microbiome.

[31]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[32]  P. Bork,et al.  A Holistic Approach to Marine Eco-Systems Biology , 2011, PLoS biology.

[33]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[34]  Kai Blin,et al.  antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification , 2017, Nucleic Acids Res..

[35]  D. C. Suyal,et al.  Microbial diversity and soil physiochemical characteristic of higher altitude , 2019, PloS one.

[36]  Robert D. Finn,et al.  EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies , 2017, Nucleic Acids Res..

[37]  I-Min A. Chen,et al.  IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes , 2018, Nucleic Acids Res..

[38]  Adam M. Phillippy,et al.  Interactive metagenomic visualization in a Web browser , 2011, BMC Bioinformatics.

[39]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[40]  Xuefa Shi,et al.  Microbial community composition and diversity in the Indian Ocean deep sea REY-rich muds , 2018, PloS one.

[41]  John Vollmers,et al.  Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist’s Perspective - Not Only Size Matters! , 2017, PloS one.

[42]  I-Min A. Chen,et al.  Genomes OnLine database (GOLD) v.7: updates and new features , 2018, Nucleic Acids Res..

[43]  Neil D. Rawlings,et al.  Genome properties in 2019: a new companion database to InterPro for the inference of complete functional attributes , 2018, Nucleic Acids Res..

[44]  M. Meyer,et al.  Quantitative evaluation of bioaerosols in different particle size fractions in dust collected on the International Space Station (ISS) , 2019, Applied Microbiology and Biotechnology.

[45]  Mary Goldman,et al.  Toil enables reproducible, open source, big biomedical data analyses , 2017, Nature Biotechnology.

[46]  Silvio C. E. Tosatto,et al.  InterPro in 2019: improving coverage, classification and access to protein sequence annotations , 2018, Nucleic Acids Res..

[47]  Kevin Vervier,et al.  Stunted microbiota and opportunistic pathogen colonisation in caesarean section birth , 2019, Nature.

[48]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[49]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..

[50]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[51]  Christian von Mering,et al.  MAPseq: highly efficient k-mer search with confidence estimates, for rRNA sequence analysis , 2017, Bioinform..

[52]  R. Daniel,et al.  Phylogenetic Diversity and Metabolic Potential Revealed in a Glacier Ice Metagenome , 2009, Applied and Environmental Microbiology.

[53]  Natalia N. Ivanova,et al.  Insights into the phylogeny and coding potential of microbial dark matter , 2013, Nature.

[54]  Yong Li,et al.  Global diversity and biogeography of bacterial communities in wastewater treatment plants , 2019, Nature Microbiology.

[55]  Luis Pedro Coelho,et al.  Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper , 2016, bioRxiv.