Machine Learning Reveals Missing Edges and Putative Interaction Mechanisms in Microbial Ecosystem Networks

Different organisms in a microbial community may drastically affect each other’s growth phenotypes, significantly affecting the community dynamics, with important implications for human and environmental health. Novel culturing methods and the decreasing costs of sequencing will gradually enable high-throughput measurements of pairwise interactions in systematic coculturing studies. However, a thorough characterization of all interactions that occur within a microbial community is greatly limited both by the combinatorial complexity of possible assortments and by the limited biological insight that interaction measurements typically provide without laborious specific follow-ups. Here, we show how a simple and flexible formal representation of microbial pairs can be used for the classification of interactions via machine learning. The approach we propose predicts with high accuracy the outcome of yet-to-be performed experiments and generates testable hypotheses about the mechanisms of specific interactions. ABSTRACT Microbes affect each other’s growth in multiple, often elusive, ways. The ensuing interdependencies form complex networks, believed to reflect taxonomic composition as well as community-level functional properties and dynamics. The elucidation of these networks is often pursued by measuring pairwise interactions in coculture experiments. However, the combinatorial complexity precludes an exhaustive experimental analysis of pairwise interactions, even for moderately sized microbial communities. Here, we used a machine learning random forest approach to address this challenge. In particular, we show how partial knowledge of a microbial interaction network, combined with trait-level representations of individual microbial species, can provide accurate inference of missing edges in the network and putative mechanisms underlying the interactions. We applied our algorithm to three case studies: an experimentally mapped network of interactions between auxotrophic Escherichia coli strains, a community of soil microbes, and a large in silico network of metabolic interdependencies between 100 human gut-associated bacteria. For this last case, 5% of the network was sufficient to predict the remaining 95% with 80% accuracy, and the mechanistic hypotheses produced by the algorithm accurately reflected known metabolic exchanges. Our approach, broadly applicable to any microbial or other ecological network, may drive the discovery of new interactions and new molecular mechanisms, both for therapeutic interventions involving natural communities and for the rational design of synthetic consortia. IMPORTANCE Different organisms in a microbial community may drastically affect each other’s growth phenotypes, significantly affecting the community dynamics, with important implications for human and environmental health. Novel culturing methods and the decreasing costs of sequencing will gradually enable high-throughput measurements of pairwise interactions in systematic coculturing studies. However, a thorough characterization of all interactions that occur within a microbial community is greatly limited both by the combinatorial complexity of possible assortments and by the limited biological insight that interaction measurements typically provide without laborious specific follow-ups. Here, we show how a simple and flexible formal representation of microbial pairs can be used for the classification of interactions via machine learning. The approach we propose predicts with high accuracy the outcome of yet-to-be performed experiments and generates testable hypotheses about the mechanisms of specific interactions.

[1]  Bonnie L Bassler,et al.  Chemical communication among bacteria , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Bas Teusink,et al.  Basic concepts and principles of stoichiometric modeling of metabolic networks , 2013, Biotechnology journal.

[3]  Nagasuma R. Chandra,et al.  Flux balance analysis of biological systems: applications and challenges , 2009, Briefings Bioinform..

[4]  Jeff Gore,et al.  Community structure follows simple assembly rules in microbial microcosms , 2016, Nature Ecology &Evolution.

[5]  Edward J. O'Brien,et al.  Using Genome-scale Models to Predict Biological Capabilities , 2015, Cell.

[6]  Line H. Clemmensen,et al.  Forest Floor Visualizations of Random Forests , 2016, ArXiv.

[7]  Wenying Shou,et al.  Synthetic cooperation in engineered yeast populations , 2007, Proceedings of the National Academy of Sciences.

[8]  Ali R. Zomorrodi,et al.  Genome-driven evolutionary game theory helps understand the rise of metabolic interdependencies in microbial communities , 2017, Nature Communications.

[9]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[10]  Dong Hyun Kim,et al.  Fructose: A Key Factor in the Development of Metabolic Syndrome and Hypertension , 2013, Journal of nutrition and metabolism.

[11]  Daniel Neagu,et al.  Interpreting random forest models using a feature contribution method , 2013, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI).

[12]  J. Bantle,et al.  Dietary fructose and metabolic syndrome and diabetes. , 2009, The Journal of nutrition.

[13]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[14]  Jonathan Friedman,et al.  Co-occurring soil bacteria exhibit a robust competitive hierarchy and lack of non-transitive interactions , 2017, bioRxiv.

[15]  Anatoly G Artemenko,et al.  Interpretation of QSAR Models Based on Random Forest Methods , 2011, Molecular informatics.

[16]  Sabine Weiskirchen,et al.  Fructose: A Dietary Sugar in Crosstalk with Microbiota Contributing to the Development and Progression of Non-Alcoholic Liver Disease , 2017, Front. Immunol..

[17]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[18]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[19]  W. Koh,et al.  Single-cell genome sequencing: current state of the science , 2016, Nature Reviews Genetics.

[20]  Paul Freemont,et al.  Co-culture systems and technologies: taking synthetic biology to the next level , 2014, Journal of The Royal Society Interface.

[21]  Eyal Bairey,et al.  High-order species interactions shape ecosystem diversity , 2016, Nature Communications.

[22]  Jennifer Campbell,et al.  Wall Teichoic Acid Function, Biosynthesis, and Inhibition , 2009, Chembiochem : a European journal of chemical biology.

[23]  Ruben G. A. van Heck,et al.  More than just a gut feeling: constraint-based genome-scale metabolic models for predicting functions of human intestinal microbes , 2017, Microbiome.

[24]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[25]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[26]  Jesse R. Zaneveld,et al.  Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences , 2013, Nature Biotechnology.

[27]  Zachary A. King,et al.  Constraint-based models predict metabolic and associated cellular functions , 2014, Nature Reviews Genetics.

[28]  Daniel Segrè,et al.  Synthetic Ecology of Microbes: Mathematical Models and Applications. , 2016, Journal of molecular biology.

[29]  Otto X. Cordero,et al.  Microbial interactions lead to rapid micro-scale successions on model marine particles , 2016, Nature Communications.

[30]  Fan Yang,et al.  Greatest soil microbial diversity found in micro-habitats , 2018 .

[31]  Stefano Allesina,et al.  Beyond pairwise mechanisms of species coexistence in complex communities , 2017, Nature.

[32]  K. Foster,et al.  The ecology of the microbiome: Networks, competition, and stability , 2015, Science.

[33]  L. Cigliano,et al.  Rescue of Fructose-Induced Metabolic Syndrome by Antibiotics or Faecal Transplantation in a Rat Model of Obesity , 2015, PloS one.

[34]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[35]  Roger S. Lasken,et al.  Recent advances in genomic DNA sequencing of microbial species from single cells , 2014, Nature Reviews Genetics.

[36]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[37]  James J Collins,et al.  Syntrophic exchange in synthetic microbial communities , 2014, Proceedings of the National Academy of Sciences.

[38]  Ophelia S. Venturelli,et al.  Deciphering microbial interactions in synthetic human gut microbiome communities , 2017, bioRxiv.

[39]  Eugen Bauer,et al.  Phenotypic differentiation of gastrointestinal microbes is reflected in their encoded metabolic repertoires , 2015, Microbiome.

[40]  Alex H. Lang,et al.  Metabolic resource allocation in individual microbes determines ecosystem interactions and spatial dynamics. , 2014, Cell reports.

[41]  Tomasz Blazejewski,et al.  Principles for designing synthetic microbial communities. , 2016, Current opinion in microbiology.

[42]  Daniel Neagu,et al.  Interpreting random forest classification models using a feature contribution method , 2013, IRI.

[43]  Mette Burmølle,et al.  Studying Bacterial Multispecies Biofilms: Where to Start? , 2016, Trends in microbiology.

[44]  Ying Xu,et al.  Operon prediction using both genome-specific and general genomic information , 2006, Nucleic acids research.

[45]  David K. Karig,et al.  Statistical analysis of co-occurrence patterns in microbial presence-absence datasets , 2017, PloS one.

[46]  Timothy J. Hanly,et al.  Dynamic flux balance analysis for synthetic microbial communities. , 2014, IET systems biology.

[47]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[48]  Yasuhisa Saito,et al.  Interspecies interactions are an integral determinant of microbial community dynamics , 2015, Front. Microbiol..

[49]  C. Chassard,et al.  Gut microbial adaptation to dietary consumption of fructose, artificial sweeteners and sugar alcohols: implications for host–microbe interactions contributing to obesity , 2012, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[50]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.