An Integrative Framework for Functional Analysis of Cattle Rumen Microbiomes

Metagenomics is the study of environmental microbial communities and has various applications and implications in biological research. This paper aims to study the role of microbial communities in cattle rumen and their relation to probiotic diet supplement usage as part of the EU H2020 MetaPIat project l1MetaPlat, http://www.metaplat.eu. In this research, we proposed and evaluated a computational framework to classify 16S rRNA samples from Bos taurus (cattle) rumen microbiome into a diet phenotype. We performed analysis by benchmarking various phylogeny-driven methods based on integration of biological domain knowledge of relationships and non-phylogenetic methods based on the raw abundances. The integrative approach incorporating phylogenetic tree structure into machine learning (ML) modelling achieved a high predictive performance with Accuracy of 0.925 and Kappa of 0.900 for classifying cattle microbiomes into diets supplemented with oil, nitrate, a combination and controls.

[1]  P. Lio’,et al.  Molecular phylogenetics: state-of-the-art methods for looking into the past. , 2001, Trends in genetics : TIG.

[2]  A. Patra,et al.  The effect of dietary fats on methane emissions, and its other effects on digestibility, rumen fermentation and lactation performance in cattle: A meta-analysis , 2013 .

[3]  Hayssam Soueidan,et al.  Machine learning for metagenomics: methods and tools , 2015, 1510.06621.

[4]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[5]  Kellie J. Archer,et al.  Empirical characterization of random forest variable importance measures , 2008, Comput. Stat. Data Anal..

[6]  Huiru Zheng,et al.  Integrated metagenomic analysis of the rumen microbiome of cattle reveals key biological mechanisms associated with methane traits. , 2017, Methods.

[7]  Tzu-Tsung Wong,et al.  Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation , 2015, Pattern Recognit..

[8]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[9]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[10]  Thomas R Nichols,et al.  Putting the Kappa Statistic to Use , 2010 .

[11]  R. Knight,et al.  Supervised classification of human microbiota. , 2011, FEMS microbiology reviews.

[12]  Achim Zeileis,et al.  Why and how to use random forest variable importance measures (and how you shouldn't) , 2008 .

[13]  W. Pan,et al.  An adaptive association test for microbiome data , 2016, Genome Medicine.

[14]  Edoardo Pasolli,et al.  Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights , 2016, PLoS Comput. Biol..

[15]  J. Handelsman,et al.  Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. , 1998, Chemistry & biology.

[16]  Alexander Statnikov,et al.  A comprehensive evaluation of multicategory classification methods for microbiomic data , 2013, Microbiome.

[17]  Sayan Mukherjee,et al.  A phylogenetic transform enhances analysis of compositional microbiota data , 2016 .

[18]  Paul J. McMurdie,et al.  Exact sequence variants should replace operational taxonomic units in marker-gene data analysis , 2017, The ISME Journal.

[19]  Huiru Zheng,et al.  A Comprehensive Study on Predicting Functional Role of Metagenomes Using Machine Learning Methods , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Susan Holmes,et al.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data , 2013, PloS one.

[21]  Mick Watson,et al.  Bovine Host Genetic Variation Influences Rumen Microbial Methane Production with Best Selection Criterion for Low Methane Emitting and Efficiently Feed Converting Hosts Based on Metagenomic Gene Abundance , 2016, PLoS genetics.

[22]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[23]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[24]  Huiru Zheng,et al.  PAAM-ML: A novel Phylogeny and Abundance aware Machine Learning Modelling Approach for Microbiome Classification , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  G. Wong,et al.  Characterization of the Gut Microbiome Using 16S or Shotgun Metagenomics , 2016, Front. Microbiol..

[27]  Tao Jiang,et al.  Phylogeny-based classification of microbial communities , 2014, Bioinform..

[28]  Andrew F. Magee,et al.  The Dawn of Open Access to Phylogenetic Data , 2014, PloS one.