Comparative study of classifiers for human microbiome data.

Abstract Accumulated evidence has shown that commensal microorganisms play key roles in human physiology and diseases. Dysbiosis of the human-associated microbial communities, often referred to as the human microbiome, has been associated with many diseases. Applying supervised classification analysis to the human microbiome data can help us identify subsets of microorganisms that are highly discriminative and hence build prediction models that can accurately classify unlabeled samples. Here, we systematically compare two state-of-the-art ensemble classifiers: R andom F orests (RF), e X treme G radient Boost ing decision trees (XGBoost) and two traditional methods: The e lastic net (ENET) and S upport V ector M achine (SVM) in the classification analysis of 29 benchmark human microbiome datasets. We find that XGBoost outperforms all other methods only in a few benchmark datasets. Overall, the XGBoost, RF and ENET display comparable performance in the remaining benchmark datasets. The training time of XGBoost is much longer than others, partially due to the much larger number of hyperparameters in XGBoost. We also find that the most important features selected by the four classifiers partially overlap. Yet, the difference between their classification performance is almost independent of this overlap.

[1]  P. Schloss,et al.  Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions , 2016, Genome Medicine.

[2]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[3]  R. Gibbs,et al.  16S gut community of the Cameron County Hispanic Cohort , 2015, Microbiome.

[4]  Rafael A. Irizarry,et al.  Meta-analysis of gut microbiome studies identifies disease-specific and shared responses , 2017, Nature Communications.

[5]  Lixin Zhu,et al.  Characterization of gut microbiomes in nonalcoholic steatohepatitis (NASH) patients: A connection between endogenous alcohol and NASH , 2013, Hepatology.

[6]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[7]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[8]  Dean Y. Li,et al.  Endothelial TLR4 and the microbiome drive cerebral cavernous malformations , 2017, Nature.

[9]  R. Knight,et al.  Bacterial Community Variation in Human Body Habitats Across Space and Time , 2009, Science.

[10]  E. Le Chatelier,et al.  Gut microbiome modulates response to anti–PD-1 immunotherapy in melanoma patients , 2018, Science.

[11]  Tracy K. Teal,et al.  Intestinal microbial communities associated with acute enteric infections and disease recovery , 2015, Microbiome.

[12]  Eric J. Alm,et al.  Non-Invasive Mapping of the Gastrointestinal Microbiota Identifies Children with Inflammatory Bowel Disease , 2012, PloS one.

[13]  A. Kane,et al.  Intestinal microbiota, microbial translocation, and systemic inflammation in chronic HIV infection. , 2015, The Journal of infectious diseases.

[14]  C. Huttenhower,et al.  Expansion of intestinal Prevotella copri correlates with enhanced susceptibility to arthritis , 2013, eLife.

[15]  C. Huttenhower,et al.  The healthy human microbiome , 2016, Genome Medicine.

[16]  Yang-Yu Liu,et al.  Link Prediction through Deep Learning , 2018, bioRxiv.

[17]  E. Pekkonen,et al.  Gut microbiota are related to Parkinson's disease and clinical phenotype , 2015, Movement disorders : official journal of the Movement Disorder Society.

[18]  R. Knight,et al.  Forensic identification using skin bacterial communities , 2010, Proceedings of the National Academy of Sciences.

[19]  Joshua LaBaer,et al.  Reduced Incidence of Prevotella and Other Fermenters in Intestinal Microflora of Autistic Children , 2013, PloS one.

[20]  Timothy L. Tickle,et al.  Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment , 2012, Genome Biology.

[21]  Zhigang Zhang,et al.  Large-Scale Survey of Gut Microbiota Associated With MHE Via 16S rRNA-Based Pyrosequencing , 2013, The American Journal of Gastroenterology.

[22]  Thomas Colthurst,et al.  Compact multi-class boosted trees , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[23]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[24]  Luke R. Thompson,et al.  Species-level functional profiling of metagenomes and metatranscriptomes , 2018, Nature Methods.

[25]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[26]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[27]  R. Knight,et al.  Supervised classification of human microbiota. , 2011, FEMS microbiology reviews.

[28]  M. Pop,et al.  Metagenomic Analysis of the Human Distal Gut Microbiome , 2006, Science.

[29]  Alioune Ngom,et al.  A review on machine learning principles for multi-view biological data integration , 2016, Briefings Bioinform..

[30]  Laura M Cox,et al.  Alterations of the human gut microbiome in multiple sclerosis , 2016, Nature Communications.

[31]  Kieran Rea,et al.  The Microbiota-Gut-Brain Axis. , 2019, Physiological reviews.

[32]  Yoshihiro Yamanishi,et al.  Supervised prediction of drug–target interactions using bipartite local models , 2009, Bioinform..

[33]  William A. Walters,et al.  Conducting a Microbiome Study , 2014, Cell.

[34]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[35]  Ralph Mazitschek,et al.  Treatment of Obesity with Celastrol , 2015, Cell.

[36]  D. Frank,et al.  Comparison of Fecal Microbiota in Children with Autism Spectrum Disorders and Neurotypical Siblings in the Simons Simplex Collection , 2015, PloS one.

[37]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[38]  A. Darzi,et al.  Gut microbiome-host interactions in health and disease , 2011, Genome Medicine.

[39]  S. Lynch,et al.  The Human Intestinal Microbiome in Health and Disease. , 2016, The New England journal of medicine.

[40]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[41]  Se Jin Song,et al.  The treatment-naive microbiome in new-onset Crohn's disease. , 2014, Cell host & microbe.

[42]  R. Spiller,et al.  Irritable bowel syndrome , 2015, Nature Reviews Disease Primers.

[43]  Matthew J. Gebert,et al.  Alterations in the gut microbiota associated with HIV-1 infection. , 2013, Cell host & microbe.

[44]  P. Bork,et al.  Gut Microbiota Linked to Sexual Preference and HIV Infection , 2016, EBioMedicine.

[45]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[46]  R. Knight,et al.  Analysis of the Gut Microbiota in the Old Order Amish and Its Relation to the Metabolic Syndrome , 2012, PloS one.

[47]  D. Helm,et al.  The gut microbiota promotes hepatic fatty acid desaturation and elongation in mice , 2018, Nature Communications.

[48]  C. Xiang,et al.  Human Intestinal Lumen and Mucosa-Associated Microbiota in Patients with Colorectal Cancer , 2012, PloS one.

[49]  Patrick D. Schloss,et al.  Microbiome Data Distinguish Patients with Clostridium difficile Infection and Non-C. difficile-Associated Diarrhea from Healthy Controls , 2014, mBio.

[50]  Liping Zhao,et al.  Structural segregation of gut microbiota between colorectal cancer patients and healthy volunteers , 2011, The ISME Journal.

[51]  Anders F. Andersson,et al.  A pyrosequencing study in twins shows that gastrointestinal microbial profiles vary with inflammatory bowel disease phenotypes. , 2010, Gastroenterology.