The Phylogenetic Tree based Deep Forest for Metagenomic Data Classification

Microorganisms are closely related to human health and have an impact on the development of various diseases. It is extremely significant to identify the relationships between microorganisms and the phenotypes (such as healthy or disease status) by analyzing microbial abundance in personalized medicine. Deep learning allows computational models that composed of multiple processing layers to learn representation of data with multiple levels of abstraction. These methods have improved the state-of-the-art in speech recognition, visual object recognition and object detection. However, current deep models are typically neural networks which are actually multiple layers of parameterized differentiable nonlinear models that can be trained by backpropagation. It is interesting to explore other deep learning models to handle tasks with small sample size and high dimensional data. While a unique feature of microbial data is that it has phylogenetic tree structure information which can be embedded to improve the classification performance. In this work, in order to further improve the metagenomic classification, we propose a deep model named Cascade Deep Forest which keeps the spatial structure between nodes through embedding phylogenetic tree information. Our results demonstrate: 1) the modified cascade structure can enhance the classification performance of Deep Forest; 2) embedding phylogenetic tree information can also improve the classification of the models; 3) Deep Forest achieves highly competitive performance to deep neural networks.

[1]  R. Beiko,et al.  Phylogenetic approaches to microbial community classification , 2015, Microbiome.

[2]  Qiyun Zhu,et al.  Methods for phylogenetic analysis of microbiome data , 2018, Nature Microbiology.

[3]  Cesare Furlanello,et al.  Phylogenetic convolutional neural networks in metagenomics , 2017, BMC Bioinformatics.

[4]  J. Gilbert,et al.  Metagenomics - a guide from sampling to data analysis , 2012, Microbial Informatics and Experimentation.

[5]  James T. Morton,et al.  Microbiome-wide association studies link dynamic microbial consortia to disease , 2016, Nature.

[6]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[7]  M. Brilliant,et al.  Personalized medicine going precise: from genomics to microbiomics. , 2015, Trends in molecular medicine.

[8]  C. Karp,et al.  Obesity and the gut microbiome: Striving for causality. , 2012, Molecular metabolism.

[9]  Gregory Ditzler,et al.  Multi-Layer and Recursive Neural Networks for Metagenomic Classification , 2015, IEEE Transactions on NanoBioscience.

[10]  R. Knight,et al.  Supervised classification of human microbiota. , 2011, FEMS microbiology reviews.

[11]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[12]  Edoardo Pasolli,et al.  Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights , 2016, PLoS Comput. Biol..

[13]  R. Schwabe,et al.  The gut microbiome and liver cancer: mechanisms and clinical translation , 2017, Nature Reviews Gastroenterology &Hepatology.

[14]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[15]  Derek Reiman,et al.  PopPhy-CNN: A Phylogenetic Tree Embedded Architecture for Convolution Neural Networks for Metagenomic Data , 2018, bioRxiv.

[16]  M. Blaser,et al.  The human microbiome: at the interface of health and disease , 2012, Nature Reviews Genetics.