IPCO: Inference of Pathways from Co-variance analysis

Background Key aspects of microbiome research are the accurate identification of taxa and the profiling of their functionality. Amplicon profiling based on the 16S ribosomal DNA sequence is a ubiquitous technique to identify and profile the abundance of the various taxa. However, it does not provide information on their encoded functionality. Predictive tools that can accurately extrapolate the functional information of a microbiome based on taxonomic profile composition are essential. At present, the applicability of these tools is limited due to requirement of reference genomes from known species. We present IPCO (Inference of Pathways from Co-variance analysis), a new method of inferring functionality for 16S-based microbiome profiles independent of reference genomes. IPCO utilises the biological co-variance observed between paired taxonomic and functional profiles and co-varies it with the queried dataset. Results IPCO outperforms other established methods both in terms of sample and feature profile prediction. Validation results confirmed that IPCO can replicate observed biological associations between shotgun and metabolite profiles. Comparative analysis of predicted functionality profiles with other popular 16S-based functional prediction tools showed significantly lower performances with predicted functionality showing little to no correlation with paired shotgun features across samples. Conclusions IPCO can infer functionality from 16S datasets and significantly outperforms existing tools. IPCO is implemented in R and available from https://github.com/IPCO-Rlibrary/IPCO .

[1]  Jean Thioulouse,et al.  CO‐INERTIA ANALYSIS AND THE LINKING OF ECOLOGICAL DATA TABLES , 2003 .

[2]  M. L. Ojeda,et al.  Beneficial role of dietary folic acid on cholesterol and bile acid metabolism in ethanol-fed rats. , 2009, Journal of studies on alcohol and drugs.

[3]  Katherine H. Huang,et al.  A framework for human microbiome research , 2012, Nature.

[4]  Taxa-function robustness in microbial communities , 2018, Microbiome.

[5]  NIH Human Microbiome Portfolio Analysis Team,et al.  A review of 10 years of human microbiome research activities at the US National Institutes of Health, Fiscal Years 2007-2016 , 2019 .

[6]  G. Wong,et al.  Characterization of the Gut Microbiome Using 16S or Shotgun Metagenomics , 2016, Front. Microbiol..

[7]  Paolo Manghi,et al.  Accessible, curated metagenomic data through ExperimentHub , 2017, Nature Methods.

[8]  Brian L. Schmidt,et al.  Piphillin: Improved Prediction of Metagenomic Content by Direct Inference from Human Microbiomes , 2016, PloS one.

[9]  C. Braak,et al.  Matching species traits to environmental variables: a new three-table ordination method , 1996, Environmental and Ecological Statistics.

[10]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[11]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[12]  James R. Cole,et al.  Ribosomal Database Project: data and tools for high throughput rRNA analysis , 2013, Nucleic Acids Res..

[13]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[14]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[15]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[16]  Fecal short-chain fatty acids are not predictive of colonic tumor status and cannot be predicted based on bacterial community structure , 2019, bioRxiv.

[17]  Dan Xi,et al.  A review of 10 years of human microbiome research activities at the US National Institutes of Health, Fiscal Years 2007-2016 , 2019, Microbiome.

[18]  Levi Waldron,et al.  HMP16SData: Efficient Access to the Human Microbiome Project through Bioconductor , 2018, bioRxiv.

[19]  Marcus J. Claesson,et al.  Comparing Apples and Oranges?: Next Generation Sequencing and Its Impact on Microbiome Analysis , 2016, PloS one.

[20]  Luke R. Thompson,et al.  Species-level functional profiling of metagenomes and metatranscriptomes , 2018, Nature Methods.

[21]  Stéphane Dray,et al.  Testing the species traits-environment relationships: the fourth-corner problem revisited. , 2008, Ecology.

[22]  S. Dolédec,et al.  Co‐inertia analysis: an alternative method for studying species–environment relationships , 1994 .

[23]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[24]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[27]  Jens Roat Kultima,et al.  Potential of fecal microbiota for early‐stage detection of colorectal cancer , 2014 .

[28]  R. DeSalle,et al.  Large-scale differences in microbial biodiversity discovery between 16S amplicon and shotgun sequencing , 2017, Scientific Reports.

[29]  Zhi-hua Chen,et al.  Kyoto Encyclopedia of Genes and Genomes were used for functional enrichment analysis of differentially expressed genes (DEGs). A protein‐protein interaction network was constructed, and the hub genes were subjected to module analysis and identification using Search Tool for the Retrieval , 2019 .

[30]  D. Sinderen,et al.  Gut microbiota composition correlates with diet and health in the elderly , 2012, Nature.

[31]  Jesse R. Zaneveld,et al.  Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences , 2013, Nature Biotechnology.

[32]  D. Hadrich Microbiome Research Is Becoming the Key to Better Understanding Health and Nutrition , 2018, Front. Genet..

[33]  Anne-Béatrice Dufour,et al.  The ade4 Package: Implementing the Duality Diagram for Ecologists , 2007 .

[34]  P. Legendre,et al.  Ecologically meaningful transformations for ordination of species data , 2001, Oecologia.

[35]  F. Ryan,et al.  SPINGO: a rapid species-classifier for microbial amplicon sequences , 2015, BMC Bioinformatics.

[36]  Peter Meinicke,et al.  Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data , 2015, Bioinform..