A Case Study for Large-Scale Human Microbiome Analysis Using JCVI’s Metagenomics Reports (METAREP)

As metagenomic studies continue to increase in their number, sequence volume and complexity, the scalability of biological analysis frameworks has become a rate-limiting factor to meaningful data interpretation. To address this issue, we have developed JCVI Metagenomics Reports (METAREP) as an open source tool to query, browse, and compare extremely large volumes of metagenomic annotations. Here we present improvements to this software including the implementation of a dynamic weighting of taxonomic and functional annotation, support for distributed searches, advanced clustering routines, and integration of additional annotation input formats. The utility of these improvements to data interpretation are demonstrated through the application of multiple comparative analysis strategies to shotgun metagenomic data produced by the National Institutes of Health Roadmap for Biomedical Research Human Microbiome Project (HMP) (http://nihroadmap.nih.gov). Specifically, the scalability of the dynamic weighting feature is evaluated and established by its application to the analysis of over 400 million weighted gene annotations derived from 14 billion short reads as predicted by the HMP Unified Metabolic Analysis Network (HUMAnN) pipeline. Further, the capacity of METAREP to facilitate the identification and simultaneous comparison of taxonomic and functional annotations including biological pathway and individual enzyme abundances from hundreds of community samples is demonstrated by providing scenarios that describe how these data can be mined to answer biological questions related to the human microbiome. These strategies provide users with a reference of how to conduct similar large-scale metagenomic analyses using METAREP with their own sequence data, while in this study they reveal insights into the nature and extent of variation in taxonomic and functional profiles across body habitats and individuals. Over one thousand HMP WGS datasets and the latest open source code are available at http://www.jcvi.org/hmp-metarep.

[1]  Rick L. Stevens,et al.  Meeting Report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project , 2010, Standards in genomic sciences.

[2]  P. Bork,et al.  Enterotypes of the human gut microbiome , 2011, Nature.

[3]  Mihai Pop,et al.  Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples , 2009, PLoS Comput. Biol..

[4]  Katherine H. Huang,et al.  A framework for human microbiome research , 2012, Nature.

[5]  Johannes Goll,et al.  Bioinformatics Applications Note Database and Ontologies Metarep: Jcvi Metagenomics Reports—an Open Source Tool for High-performance Comparative Metagenomics , 2022 .

[6]  C. Szymanski,et al.  Campylobacter Protein Glycosylation Affects Host Cell Interactions , 2002, Infection and Immunity.

[7]  C. Szymanski,et al.  Protein glycosylation in bacterial mucosal pathogens , 2005, Nature Reviews Microbiology.

[8]  A. Datta Characterization of the inhibition of Escherichia coli pyruvate dehydrogenase complex by pyruvate. , 1991, Biochemical and biophysical research communications.

[9]  D. Paslier,et al.  Metabolic diversity among main microorganisms inside an arsenic-rich ecosystem revealed by meta- and proteo-genomics , 2011, The ISME Journal.

[10]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[11]  Jonathan Crabtree,et al.  Ergatis: a web interface and scalable software system for bioinformatics workflows , 2010, Bioinform..

[12]  S. Tringe,et al.  Metagenomic Discovery of Biomass-Degrading Genes and Genomes from Cow Rumen , 2011, Science.

[13]  Manuel Liebeke,et al.  Pyruvate Formate Lyase Acts as a Formate Supplier for Metabolic Processes during Anaerobiosis in Staphylococcus aureus , 2010, Journal of bacteriology.

[14]  R. Knight,et al.  Moving pictures of the human microbiome , 2011, Genome Biology.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  S. Socransky,et al.  Distribution of selected bacterial species on intraoral surfaces. , 2003, Journal of clinical periodontology.

[17]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[18]  Shital Patel,et al.  The Human Microbiome Project strategy for comprehensive sampling of the human microbiome and why it matters , 2013, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[19]  J. Izard,et al.  The Human Oral Microbiome , 2010, Journal of bacteriology.

[20]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[21]  R. Knight,et al.  Bacterial Community Variation in Human Body Habitats Across Space and Time , 2009, Science.

[22]  J. Martínez,et al.  Metabolic regulation of antibiotic resistance. , 2011, FEMS microbiology reviews.

[23]  B. Haas,et al.  A Catalog of Reference Genomes from the Human Microbiome , 2010, Science.

[24]  B. Golding,et al.  Radical enzymes in anaerobes. , 2006, Annual review of microbiology.

[25]  Lu Wang,et al.  The NIH Human Microbiome Project. , 2009, Genome research.

[26]  Peer Bork,et al.  Enterotypes of the human gut microbiome , 2011, Nature.

[27]  D. Fell,et al.  The small world of metabolism , 2000, Nature Biotechnology.

[28]  D. Antonopoulos,et al.  Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. , 2010, Cold Spring Harbor protocols.

[29]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[30]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.

[31]  S. Madsen,et al.  Cloning, expression, and characterization of the Lactococcus lactis pfl gene, encoding pyruvate formate-lyase , 1997, Journal of bacteriology.

[32]  Jizhong Zhou,et al.  Significant Association between Sulfate-Reducing Bacteria and Uranium-Reducing Microbial Communities as Revealed by a Combined Massively Parallel Sequencing-Indicator Species Approach , 2010, Applied and Environmental Microbiology.

[33]  Xueyang Feng,et al.  Metabolic Flux Analysis of the Mixotrophic Metabolisms in the Green Sulfur Bacterium Chlorobaculum tepidum* , 2010, The Journal of Biological Chemistry.

[34]  S. Kravitz,et al.  The JCVI standard operating procedure for annotating prokaryotic metagenomic shotgun sequencing data , 2010, Standards in genomic sciences.

[35]  Bernard Henrissat,et al.  Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome , 2012, PLoS Comput. Biol..

[36]  J. Cronan,et al.  Protein-Protein Interactions in Assembly of Lipoic Acid on the 2-Oxoacid Dehydrogenases of Aerobic Metabolism* , 2011, The Journal of Biological Chemistry.

[37]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[38]  A. Helenius,et al.  Roles of N-linked glycans in the endoplasmic reticulum. , 2004, Annual review of biochemistry.

[39]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[40]  A. Halpern,et al.  The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific , 2007, PLoS biology.

[41]  J. A. Aas,et al.  Defining the Normal Bacterial Flora of the Oral Cavity , 2005, Journal of Clinical Microbiology.