HumanMetagenomeDB: a public repository of curated and standardized metadata for human metagenomes

Abstract Metagenomics became a standard strategy to comprehend the functional potential of microbial communities, including the human microbiome. Currently, the number of metagenomes in public repositories is increasing exponentially. The Sequence Read Archive (SRA) and the MG-RAST are the two main repositories for metagenomic data. These databases allow scientists to reanalyze samples and explore new hypotheses. However, mining samples from them can be a limiting factor, since the metadata available in these repositories is often misannotated, misleading, and decentralized, creating an overly complex environment for sample reanalysis. The main goal of the HumanMetagenomeDB is to simplify the identification and use of public human metagenomes of interest. HumanMetagenomeDB version 1.0 contains metadata of 69 822 metagenomes. We standardized 203 attributes, based on standardized ontologies, describing host characteristics (e.g. sex, age and body mass index), diagnosis information (e.g. cancer, Crohn's disease and Parkinson), location (e.g. country, longitude and latitude), sampling site (e.g. gut, lung and skin) and sequencing attributes (e.g. sequencing platform, average length and sequence quality). Further, HumanMetagenomeDB version 1.0 metagenomes encompass 58 countries, 9 main sample sites (i.e. body parts), 58 diagnoses and multiple ages, ranging from just born to 91 years old. The HumanMetagenomeDB is publicly available at https://webapp.ufz.de/hmgdb/.

[1]  Wei Li,et al.  gcMeta: a Global Catalogue of Metagenomics platform to support the archiving, standardization and analysis of microbiome data , 2018, Nucleic Acids Res..

[2]  Chris F. Taylor,et al.  The minimum information about a genome sequence (MIGS) specification , 2008, Nature Biotechnology.

[3]  Robert D. Finn,et al.  EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies , 2017, Nucleic Acids Res..

[4]  P. Chain,et al.  Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. , 2012, Current opinion in biotechnology.

[5]  Xia Yang,et al.  Bioinformatics Principles for Deciphering Cardiovascular Diseases , 2014 .

[6]  Helen E. Parkinson,et al.  BioSamples database: an updated sample metadata hub , 2018, Nucleic Acids Res..

[7]  Lu Wang,et al.  The NIH Human Microbiome Project. , 2009, Genome research.

[8]  Paul Theodor Pyl,et al.  Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer , 2019, Nature Medicine.

[9]  Tatiana A. Tatusova,et al.  BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata , 2011, Nucleic Acids Res..

[10]  Mingxun Wang,et al.  Qiita: rapid, web-enabled microbiome meta-analysis , 2018, Nature Methods.

[11]  Robert A. Edwards,et al.  PARTIE: a partition engine to separate metagenomic and amplicon projects in the Sequence Read Archive , 2017, Bioinform..

[12]  I-Min A. Chen,et al.  Genomes OnLine database (GOLD) v.7: updates and new features , 2018, Nucleic Acids Res..

[13]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[14]  AnHai Doan,et al.  MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive , 2017, Bioinform..

[15]  Paolo Manghi,et al.  Accessible, curated metagenomic data through ExperimentHub , 2017, Nature Methods.

[16]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2010, Nucleic Acids Res..

[17]  P. Stadler,et al.  TerrestrialMetagenomeDB: a public repository of curated and standardized metadata for terrestrial metagenomes , 2019, bioRxiv.

[18]  Emily S. Charlson,et al.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications , 2011, Nature Biotechnology.

[19]  Andreas Wilke,et al.  The MG-RAST metagenomics database and portal in 2015 , 2015, Nucleic Acids Res..

[20]  Toshihisa Takagi,et al.  DNA data bank of Japan (DDBJ) progress report , 2015, Nucleic Acids Res..

[21]  Thomas M. Keane,et al.  The European Nucleotide Archive in 2018 , 2018, Nucleic Acids Res..

[22]  Zheng Sun,et al.  Identifying and Predicting Novelty in Microbiome Studies , 2018, mBio.

[23]  P. Bork,et al.  Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation , 2019, Nature Medicine.

[24]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[25]  Sean R. Davis,et al.  SRAdb: query and use public next-generation sequencing data from within R , 2013, BMC Bioinformatics.

[26]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[27]  J. Handelsman Metagenomics: Application of Genomics to Uncultured Microorganisms , 2004, Microbiology and Molecular Biology Reviews.