PopPhy-CNN: A Phylogenetic Tree Embedded Architecture for Convolutional Neural Networks to Predict Host Phenotype From Metagenomic Data

Accurate prediction of the host phenotype from a metagenomic sample and identification of the associated microbial markers are important in understanding potential host-microbiome interactions related to disease initiation and progression. We introduce PopPhy-CNN, a novel convolutional neural network (CNN) learning framework that effectively exploits phylogenetic structure in microbial taxa for host phenotype prediction. Our approach takes an input format of a 2D matrix representing the phylogenetic tree populated with the relative abundance of microbial taxa in a metagenomic sample. This conversion empowers CNNs to explore the spatial relationship of the taxonomic annotations on the tree and their quantitative characteristics in metagenomic data. We show the competitiveness of our model compared to other available methods using nine metagenomic datasets of moderate size for binary classification. With synthetic and biological datasets, we show the superior and robust performance of our model for multi-class classification. Furthermore, we design a novel scheme for feature extraction from the learned CNN models and demonstrate improved performance when the extracted features. PopPhy-CNN is a practical deep learning framework for the prediction of host phenotype with the ability of facilitating the retrieval of predictive microbial taxa.

[1]  Shi-Hua Zhang,et al.  Infer Metagenomic Abundance and Reveal Homologous Genomes Based on the Structure of Taxonomy Tree , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Wei Wang,et al.  MetaPheno: A critical evaluation of deep learning and machine learning in metagenome-based disease prediction. , 2019, Methods.

[3]  Jens Roat Kultima,et al.  Potential of fecal microbiota for early‐stage detection of colorectal cancer , 2014 .

[4]  James Kinross,et al.  The gut microbiota and host health: a new clinical frontier , 2015, Gut.

[5]  Jesse R. Zaneveld,et al.  Human-associated microbial signatures: examining their predictive value. , 2011, Cell host & microbe.

[6]  Tao Wang,et al.  Constructing Predictive Microbial Signatures at Multiple Taxonomic Levels , 2017 .

[7]  Andreas Henschel,et al.  Taxonomy-aware feature engineering for microbiome classification , 2018, BMC Bioinformatics.

[8]  Duy Tin Truong,et al.  MetaPhlAn2 for enhanced metagenomic taxonomic profiling , 2015, Nature Methods.

[9]  Duccio Cavalieri,et al.  Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting , 2015, PLoS Comput. Biol..

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Beiwen Zheng,et al.  Alterations of the human gut microbiome in liver cirrhosis , 2014, Nature.

[12]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[13]  Hugues Aschard,et al.  Fungal microbiota dysbiosis in IBD , 2016, Gut.

[14]  E. Assier,et al.  Interleukin-23: A key cytokine in inflammatory diseases , 2011, Annals of medicine.

[15]  Jing Yuan,et al.  Cirrhosis related functionality characteristic of the fecal microbiota as revealed by a metaproteomic approach , 2016, BMC Gastroenterology.

[16]  P. Hylemon,et al.  Linkage of gut microbiome with cognition in hepatic encephalopathy. , 2012, American journal of physiology. Gastrointestinal and liver physiology.

[17]  Yang Dai,et al.  MetaLonDA: a flexible R package for identifying time intervals of differentially abundant features in metagenomic longitudinal studies , 2018, Microbiome.

[18]  T. Martin McGinnity,et al.  A metagenomic hybrid classifier for paediatric inflammatory bowel disease , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[19]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[20]  P. Bork,et al.  Richness of human gut microbiome correlates with metabolic markers , 2013, Nature.

[21]  Fredrik H. Karlsson,et al.  Gut metagenome in European women with normal, impaired and diabetic glucose control , 2013, Nature.

[22]  Gregory Ditzler,et al.  Multi-Layer and Recursive Neural Networks for Metagenomic Classification , 2015, IEEE Transactions on NanoBioscience.

[23]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[24]  Edoardo Pasolli,et al.  Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights , 2016, PLoS Comput. Biol..

[25]  Xianyang Zhang,et al.  A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data , 2018, Front. Microbiol..

[26]  Isheng. J. Tsai,et al.  Predicting Clinical Outcomes of Cirrhosis Patients With Hepatic Encephalopathy From the Fecal Microbiome , 2019, Cellular and molecular gastroenterology and hepatology.

[27]  Alan Wells,et al.  Selection of models for the analysis of risk-factor trees: leveraging biological knowledge to mine large sets of risk factors with application to microbiome data , 2015, Bioinform..

[28]  Harry Sokol,et al.  A microbial signature for Crohn's disease , 2017, Gut.

[29]  P. Lance,et al.  Shifts in the Fecal Microbiota Associated with Adenomatous Polyps , 2016, Cancer Epidemiology, Biomarkers & Prevention.

[30]  Timothy J. Laurent,et al.  A Taxonomic Signature of Obesity in the Microbiome? Getting to the Guts of the Matter , 2014, PloS one.

[31]  M. Blaser,et al.  A powerful microbiome-based association test and a microbial taxa discovery framework for comprehensive association mapping , 2017, Microbiome.

[32]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[33]  C. Damman,et al.  The intestinal microbiome, barrier function, and immune system in inflammatory bowel disease: a tripartite pathophysiological circuit with implications for new therapeutic directions , 2016, Therapeutic advances in gastroenterology.

[34]  Derek Reiman,et al.  Using convolutional neural networks to explore the microbiome , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[35]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[36]  Cesare Furlanello,et al.  Phylogenetic convolutional neural networks in metagenomics , 2017, BMC Bioinformatics.

[37]  Jun Sun,et al.  Exploring gut microbes in human health and disease: Pushing the envelope , 2014, Genes & diseases.

[38]  Keegan Kang,et al.  Feature Representation in Convolutional Neural Networks , 2015, ArXiv.