A Phylogeny-aware Feature Ranking for Classification of Cattle Rumen Microbiome

Metagenomics is proliferating for studying environmental microbial communities and their role in animal functions. This paper aims to study the role of functions of microbial communities present in cattle (Bos taurus) and their relation to dietary supplement usage. The functional study was conducted as part of the EU H2020 MetaPlat project11MetaPlat, http://www.metaplat.eu. In this research, we proposed a novel phylogeny-driven approach to classify 16S rRNA samples from cattle rumen microbiome and relate them to the functional phenotype of diet (referred to as functional analysis). Phylogeny covers biological relationships from different taxonomical levels combined with their respective evolutionary measures. We performed this analysis by proposing a novel method based on phylogeny-adjusted distance-based indices. These indices are used in ranking microbial feature space derived from the topology of the phylogenetic tree. The integrative approach incorporating phylogeny into feature engineering as part of machine learning (ML) modeling, achieved high predictive performance with Accuracy of 0.962 and Kappa of 0.950 for classifying cattle microbiome into the phenotype of a diet supplemented with oil, nitrate, combined (with oil and nitrate) and controls.

[1]  Huiru Zheng,et al.  An Integrative Framework for Functional Analysis of Cattle Rumen Microbiomes , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[3]  Tao Jiang,et al.  Phylogeny-based classification of microbial communities , 2014, Bioinform..

[4]  Randal S. Olson,et al.  Relief-Based Feature Selection: Introduction and Review , 2017, J. Biomed. Informatics.

[5]  R. Knight,et al.  Supervised classification of human microbiota. , 2011, FEMS microbiology reviews.

[6]  C. von Mering,et al.  A family of interaction-adjusted indices of community similarity , 2016, The ISME Journal.

[7]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[8]  Michael W. Hall,et al.  16S rRNA Gene Analysis with QIIME2. , 2018, Methods in molecular biology.

[9]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[10]  Huiru Zheng,et al.  A metagenomics analysis of rumen microbiome , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[11]  Jonathan Dees,et al.  Student Interpretations of Phylogenetic Trees in an Introductory Biology Course , 2014, CBE life sciences education.

[12]  R. Knight,et al.  UniFrac: an effective distance metric for microbial community comparison , 2011, The ISME Journal.

[13]  W. Pan,et al.  An adaptive association test for microbiome data , 2016, Genome Medicine.

[14]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[15]  David Sanchez,et al.  Cophenetic metrics for phylogenetic trees, after Sokal and Rohlf , 2013, BMC Bioinformatics.

[16]  Edoardo Pasolli,et al.  Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights , 2016, PLoS Comput. Biol..

[17]  Alexander Statnikov,et al.  A comprehensive evaluation of multicategory classification methods for microbiomic data , 2013, Microbiome.

[18]  Duccio Cavalieri,et al.  Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting , 2015, PLoS Comput. Biol..

[19]  Huiru Zheng,et al.  A Comprehensive Study on Predicting Functional Role of Metagenomes Using Machine Learning Methods , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Francisco Herrera,et al.  A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability , 2009, Soft Comput..

[21]  Huiru Zheng,et al.  PAAM-ML: A novel Phylogeny and Abundance aware Machine Learning Modelling Approach for Microbiome Classification , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[22]  Hayssam Soueidan,et al.  Machine learning for metagenomics: methods and tools , 2015, 1510.06621.

[23]  Mick Watson,et al.  Bovine Host Genetic Variation Influences Rumen Microbial Methane Production with Best Selection Criterion for Low Methane Emitting and Efficiently Feed Converting Hosts Based on Metagenomic Gene Abundance , 2016, PLoS genetics.

[24]  Ivan Ivanov,et al.  Bayesian Classification of Microbial Communities Based on 16S rRNA Metagenomic Data , 2018, bioRxiv.

[25]  Paul J. McMurdie,et al.  Exact sequence variants should replace operational taxonomic units in marker-gene data analysis , 2017, The ISME Journal.

[26]  K. Ramesh Kumar,et al.  Analysis of Feature Selection Algorithms on Classification: A Survey , 2014 .

[27]  Huzefa Rangwala,et al.  Machine Learning Approaches for Metagenomics , 2014, ECML/PKDD.