TaxoNN: ensemble of neural networks on stratified microbiome data for disease prediction

Abstract Motivation Research supports the potential use of microbiome as a predictor of some diseases. Motivated by the findings that microbiome data is complex in nature, and there is an inherent correlation due to hierarchical taxonomy of microbial Operational Taxonomic Units (OTUs), we propose a novel machine learning method incorporating a stratified approach to group OTUs into phylum clusters. Convolutional Neural Networks (CNNs) were used to train within each of the clusters individually. Further, through an ensemble learning approach, features obtained from each cluster were then concatenated to improve prediction accuracy. Our two-step approach comprising stratification prior to combining multiple CNNs, aided in capturing the relationships between OTUs sharing a phylum efficiently, as compared to using a single CNN ignoring OTU correlations. Results We used simulated datasets containing 168 OTUs in 200 cases and 200 controls for model testing. Thirty-two OTUs, potentially associated with risk of disease were randomly selected and interactions between three OTUs were used to introduce non-linearity. We also implemented this novel method in two human microbiome studies: (i) Cirrhosis with 118 cases, 114 controls; (ii) type 2 diabetes (T2D) with 170 cases, 174 controls; to demonstrate the model’s effectiveness. Extensive experimentation and comparison against conventional machine learning techniques yielded encouraging results. We obtained mean AUC values of 0.88, 0.92, 0.75, showing a consistent increment (5%, 3%, 7%) in simulations, Cirrhosis and T2D data, respectively, against the next best performing method, Random Forest. Availability and implementation https://github.com/divya031090/TaxoNN_OTU. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yan Liu,et al.  Detecting Statistical Interactions from Neural Network Weights , 2017, ICLR.

[3]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[4]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[5]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[6]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[7]  A. M. van der Zande,et al.  Author Correction: Atomically precise graphene etch stops for three dimensional integrated systems from two dimensional material heterostructures , 2018, Nature Communications.

[8]  Shu-Hsi Lin,et al.  Inferring microbial interaction network from microbiome data using RMN algorithm , 2015, BMC Systems Biology.

[9]  Masahiro Ryo,et al.  Statistically reinforced machine learning for nonlinear patterns and variable interactions , 2017 .

[10]  Mark Blaxter,et al.  Defining operational taxonomic units using DNA barcode data , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[11]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[12]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[13]  Edoardo Pasolli,et al.  Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights , 2016, PLoS Comput. Biol..

[14]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[15]  Xianyang Zhang,et al.  Predictive Modeling of Microbiome Data Using a Phylogeny-Regularized Generalized Linear Mixed Model , 2018, Front. Microbiol..

[16]  D. Brenner,et al.  Interactions between the intestinal microbiome and liver diseases. , 2014, Gastroenterology.

[17]  Bo Xu,et al.  Image character recognition using deep convolutional neural network learned from different languages , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[18]  Se Jin Song,et al.  The treatment-naive microbiome in new-onset Crohn's disease. , 2014, Cell host & microbe.

[19]  Gavin A. Huttley,et al.  q2-sample-classifier: machine-learning tools for microbiome classification and regression , 2018, bioRxiv.

[20]  Zhenqiu Liu,et al.  Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data , 2011, Bioinform..

[21]  Ashwin N Ananthakrishnan,et al.  Gut Microbiome Function Predicts Response to Anti-integrin Biologic Therapy in Inflammatory Bowel Diseases. , 2017, Cell host & microbe.

[22]  Fredrik H. Karlsson,et al.  Gut metagenome in European women with normal, impaired and diabetic glucose control , 2013, Nature.

[23]  J. Raes,et al.  The resilience of the intestinal microbiota influences health and disease , 2017, Nature Reviews Microbiology.

[24]  Alexander C. Berg,et al.  Combining multiple sources of knowledge in deep CNNs for action recognition , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[25]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[26]  Radu Marculescu,et al.  MetaNN: Accurate Classification of Host Phenotypes From Metagenomic Data Using Neural Networks , 2018, BCB.

[27]  Stefano Ghidoni,et al.  Ensemble of convolutional neural networks for bioimage classification , 2020, Applied Computing and Informatics.

[28]  Wenqing Sun,et al.  Computer aided lung cancer diagnosis with deep learning algorithms , 2016, SPIE Medical Imaging.

[29]  Beiwen Zheng,et al.  Alterations of the human gut microbiome in liver cirrhosis , 2014, Nature.

[30]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[31]  Max Nieuwdorp,et al.  Insights Into the Role of the Microbiome in Obesity and Type 2 Diabetes , 2014, Diabetes Care.

[32]  Tiphaine Martin,et al.  Gut microbiota associations with common diseases and prescription medications in a population-based cohort , 2018, Nature Communications.

[33]  A. Paterson,et al.  Association of host genome with intestinal microbial composition in a large healthy cohort , 2016, Nature Genetics.

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .