HirBin: high-resolution identification of differentially abundant functions in metagenomes

BackgroundGene-centric analysis of metagenomics data provides information about the biochemical functions present in a microbiome under a certain condition. The ability to identify significant differences in functions between metagenomes is dependent on accurate classification and quantification of the sequence reads (binning). However, biological effects acting on specific functions may be overlooked if the classes are too general.MethodsHere we introduce High-Resolution Binning (HirBin), a new method for gene-centric analysis of metagenomes. HirBin combines supervised annotation with unsupervised clustering to bin sequence reads at a higher resolution. The supervised annotation is performed by matching sequence fragments to genes using well-established protein domains, such as TIGRFAM, PFAM or COGs, followed by unsupervised clustering where each functional domain is further divided into sub-bins based on sequence similarity. Finally, differential abundance of the sub-bins is statistically assessed.ResultsWe show that HirBin is able to identify biological effects that are only present at more specific functional levels. Furthermore we show that changes affecting more specific functional levels are often diluted at the more general level and therefore overlooked when analyzed using standard binning approaches.ConclusionsHirBin improves the resolution of the gene-centric analysis of metagenomes and facilitates the biological interpretation of the results. HirBin is implemented as a Python package and is freely available for download at http://bioinformatics.math.chalmers.se/hirbin.

[1]  David R. Riley,et al.  CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing , 2011, BMC Bioinformatics.

[2]  Susannah G. Tringe,et al.  FOAM (Functional Ontology Assignments for Metagenomes): a Hidden Markov Model (HMM) database with environmental focus , 2014, Nucleic acids research.

[3]  Georgios A. Pavlopoulos,et al.  Metagenomics: Tools and Insights for Analyzing Next-Generation Sequencing Data Derived from Biodiversity Studies , 2015, Bioinformatics and biology insights.

[4]  E. Kristiansson,et al.  Tentacle: distributed quantification of genes in metagenomes , 2015, GigaScience.

[5]  Scott Ferson,et al.  Accounting for uncertainty in DNA sequencing data. , 2015, Trends in genetics : TIG.

[6]  Neil Moore,et al.  A Colletotrichum graminicola mutant deficient in the establishment of biotrophy reveals early transcriptional events in the maize anthracnose disease interaction , 2016, BMC Genomics.

[7]  Erik Kristiansson,et al.  Statistical evaluation of methods for identification of differentially abundant genes in comparative metagenomics , 2016, BMC Genomics.

[8]  Intawat Nookaew,et al.  Metagenomic Data Utilization and Analysis (MEDUSA) and Construction of a Global Gut Microbial Gene Catalogue , 2014, PLoS Comput. Biol..

[9]  P. Christen,et al.  Aminotransferases: demonstration of homology and division into evolutionary subgroups. , 1993, European journal of biochemistry.

[10]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[11]  Rick L. Stevens,et al.  Unlocking the potential of metagenomics through replicated experimental design , 2012, Nature Biotechnology.

[12]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[13]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[14]  S. Schuster,et al.  Integrative analysis of environmental sequences using MEGAN4. , 2011, Genome research.

[15]  J. Clemente,et al.  Human gut microbiome viewed across age and geography , 2012, Nature.

[16]  Bijay Singh,et al.  An ll-Diaminopimelate Aminotransferase Defines a Novel Variant of the Lysine Biosynthesis Pathway in Plants1[W] , 2005, Plant Physiology.

[17]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[18]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[19]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[20]  Erik Kristiansson,et al.  ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes , 2009, Bioinform..

[21]  E. Plummer,et al.  A Comparison of Three Bioinformatics Pipelines for the Analysis ofPreterm Gut Microbiota using 16S rRNA Gene Sequencing Data , 2015 .

[22]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[23]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[24]  Tao Cai,et al.  Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary , 2005, Bioinform..

[25]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[26]  I. Nookaew,et al.  Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods , 2013, Nucleic acids research.

[27]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[28]  P. Bork,et al.  Enterotypes of the human gut microbiome , 2011, Nature.

[29]  InSong Koh,et al.  Deciphering the human microbiome using next-generation sequencing data and bioinformatics approaches. , 2015, Methods.

[30]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[31]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[32]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[33]  Tulika Prakash,et al.  Functional assignment of metagenomic data: challenges and applications , 2012, Briefings Bioinform..

[34]  Christian von Mering,et al.  Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale , 2014, PLoS Comput. Biol..

[35]  Aaron R. Quinlan,et al.  BEDTools: a flexible suite of utilities for comparing genomic features , 2010, Bioinform..

[36]  Peer Bork,et al.  MOCAT2: a metagenomic assembly, annotation and profiling framework , 2016, Bioinform..

[37]  S. O’Brien,et al.  SmileFinder: a resampling-based approach to evaluate signatures of selection from genome-wide sets of matching allele frequency data in two or more diploid populations , 2015, GigaScience.

[38]  Tungadri Bose,et al.  COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets , 2015, PloS one.

[39]  T. Thomas,et al.  Bacterial community assembly based on functional genes rather than species , 2011, Proceedings of the National Academy of Sciences.

[40]  E. Kristiansson,et al.  Integrative analysis of omics data. , 2017, Methods.

[41]  S. Tringe,et al.  Tackling soil diversity with the assembly of large, complex metagenomes , 2014, Proceedings of the National Academy of Sciences.

[42]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[43]  Jens Roat Kultima,et al.  An integrated catalog of reference genes in the human gut microbiome , 2014, Nature Biotechnology.

[44]  D. Raoult,et al.  The human gut microbiome, a taxonomic conundrum. , 2015, Systematic and applied microbiology.

[45]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[46]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[47]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[48]  D. Antonopoulos,et al.  Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. , 2010, Cold Spring Harbor protocols.