A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems

Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies. ABSTRACT Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability. IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.

[1]  Minseon Kim,et al.  An Improved Method for Prediction of Cancer Prognosis by Network Learning , 2018, Genes.

[2]  Derek Reiman,et al.  Using convolutional neural networks to explore the microbiome , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[3]  P. Schloss,et al.  The Human Gut Microbiome as a Screening Tool for Colorectal Cancer , 2014, Cancer Prevention Research.

[4]  Evgeny Putin,et al.  Human microbiome aging clocks based on deep learning and tandem of permutation feature importance and accumulated local effects , 2018, bioRxiv.

[5]  R. Knight,et al.  Meta‐analyses of human gut microbes associated with obesity and IBD , 2014, FEBS letters.

[6]  Patrick D. Schloss,et al.  Looking for a Signal in the Noise: Revisiting Obesity and the Microbiome , 2016, mBio.

[7]  Paul Theodor Pyl,et al.  Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer , 2019, Nature Medicine.

[8]  Francesco Masulli,et al.  Soft Computing Applications , 2003 .

[9]  Cynthia Rudin,et al.  Optimized Scoring Systems: Toward Trust in Machine Learning for Healthcare and Criminal Justice , 2018, Interfaces.

[10]  Patrick D Schloss,et al.  OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units , 2017, mSphere.

[11]  Jun Yu,et al.  Quantitation of faecal Fusobacterium improves faecal immunochemical test in detecting advanced colorectal neoplasia , 2016, Gut.

[12]  D. Ahlquist,et al.  Stool DNA Testing for Screening Detection of Colorectal Neoplasia in Alaska Native People. , 2016, Mayo Clinic proceedings.

[13]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[14]  Eran Segal,et al.  Persistent microbiome alterations modulate the rate of post-dieting weight regain , 2016, Nature.

[15]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[16]  P. Schloss,et al.  DNA from fecal immunochemical test can replace stool for detection of colonic lesions using a microbiota-based model , 2016, Microbiome.

[17]  P. Bork,et al.  Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation , 2019, Nature Medicine.

[18]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[19]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[20]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[21]  Fecal short-chain fatty acids are not predictive of colonic tumor status and cannot be predicted based on bacterial community structure , 2019, bioRxiv.

[22]  Cesare Furlanello,et al.  Phylogenetic convolutional neural networks in metagenomics , 2017, BMC Bioinformatics.

[23]  F. Shanahan,et al.  The oral microbiota in colorectal cancer is distinctive and predictive , 2017, Gut.

[24]  Ben Nichols,et al.  VSEARCH: a versatile open source tool for metagenomics , 2016, PeerJ.

[25]  Gabriel A. Al-Ghalith,et al.  Pretreatment gut microbiome predicts chemotherapy-related bloodstream infection , 2016, Genome Medicine.

[26]  Leveraging Existing 16S rRNA Gene Surveys To Identify Reproducible Biomarkers in Individuals with Colorectal Tumors , 2018, mBio.

[27]  Rob Knight,et al.  Guiding longitudinal sampling in IBD cohorts , 2017, Gut.

[28]  Jie Xu,et al.  Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer , 2017, Oncotarget.

[29]  Cynthia Rudin,et al.  Please Stop Explaining Black Box Models for High Stakes Decisions , 2018, ArXiv.

[30]  Sidney Draggan,et al.  Why should I? , 2007 .

[31]  Jesse R. Zaneveld,et al.  Human-associated microbial signatures: examining their predictive value. , 2011, Cell host & microbe.

[32]  C. Huttenhower,et al.  Metagenomic biomarker discovery and explanation , 2011, Genome Biology.

[33]  P. Lance,et al.  Shifts in the Fecal Microbiota Associated with Adenomatous Polyps , 2016, Cancer Epidemiology, Biomarkers & Prevention.

[34]  Nicholas A. Lesniak,et al.  Fecal Short-Chain Fatty Acids Are Not Predictive of Colonic Tumor Status and Cannot Be Predicted Based on Bacterial Community Structure , 2019, mBio.

[35]  Jiuyong Li,et al.  Accurate data-driven prediction does not mean high reproducibility , 2020, Nature Machine Intelligence.

[36]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[37]  David C. Kale,et al.  Do no harm: a roadmap for responsible machine learning for health care , 2019, Nature Medicine.

[38]  P. Gillevet,et al.  Gut microbiome identifies risk for colorectal polyps , 2019, BMJ open gastroenterology.

[39]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[40]  B. MacArthur,et al.  Classification of Paediatric Inflammatory Bowel Disease using Machine Learning , 2017, Scientific Reports.

[41]  Thomas Lengauer,et al.  Permutation importance: a corrected feature importance measure , 2010, Bioinform..

[42]  Oana Geman,et al.  Deep Learning Tools for Human Microbiome Big Data , 2016, SOFA.

[43]  Jens Roat Kultima,et al.  Potential of fecal microbiota for early‐stage detection of colorectal cancer , 2014 .

[44]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[45]  Zhenwei Dai,et al.  Multi-cohort analysis of colorectal cancer metagenome identified altered bacteria across populations and universal bacterial markers , 2018, Microbiome.

[46]  Rich Caruana,et al.  InterpretML: A Unified Framework for Machine Learning Interpretability , 2019, ArXiv.

[47]  Beiwen Zheng,et al.  Alterations of the human gut microbiome in liver cirrhosis , 2014, Nature.

[48]  R. Knight,et al.  Supervised classification of human microbiota. , 2011, FEMS microbiology reviews.

[49]  Thomas P. Quinn,et al.  Another look at microbe–metabolite interactions: how scale invariant correlations can outperform a neural network , 2019, bioRxiv.

[50]  Panos M. Pardalos,et al.  Massive datasets and machine learning for computational biomedicine: trends and challenges , 2018, Annals of Operations Research.

[51]  P. Schloss,et al.  Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions , 2016, Genome Medicine.

[52]  Dan Knights,et al.  Microbiome Learning Repo (ML Repo): A public repository of microbiome regression and classification tasks , 2019, GigaScience.

[53]  Edoardo Pasolli,et al.  Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights , 2016, PLoS Comput. Biol..

[54]  Alexander Statnikov,et al.  A comprehensive evaluation of multicategory classification methods for microbiomic data , 2013, Microbiome.