Efficient Examination of Soil Bacteria Using Probabilistic Graphical Models

This paper describes a novel approach to study bacterial relationships in soil datasets using probabilistic graphical models. We demonstrate how to access and reformat publicly available datasets in order to apply machine learning techniques. We first learn a Bayesian network in order to read independencies in linear time between bacterial community characteristics. These independencies are useful in understanding the semantic relationships between bacteria within communities. Next, we learn a Sum-Product network in order to perform inference in linear time. Here, inference can be conducted to answer traditional queries, involving posterior probabilities, or MPE queries, requesting the most likely values of the non-evidence variables given evidence. Our results extend the literature by showing that known relationships between soil bacteria holding in one or a few datasets in fact hold across at least 3500 diverse datasets. This study paves the way for future large-scale studies of agricultural, health, and environmental applications, for which data are publicly available.

[1]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[2]  Marek J. Druzdzel,et al.  SMILE: Structural Modeling, Inference, and Learning Engine and GeNIE: A Development Environment for Graphical Decision-Theoretic Models , 1999, AAAI/IAAI.

[3]  Floriana Esposito,et al.  Simplifying, Regularizing and Strengthening Sum-Product Network Structure Learning , 2015, ECML/PKDD.

[4]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[5]  Rout George Kerry,et al.  Revitalization of plant growth promoting rhizobacteria for sustainable development in agriculture. , 2018, Microbiological research.

[6]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[7]  Trevor Hastie,et al.  Overview of Supervised Learning , 2001 .

[8]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[9]  David S. Wishart,et al.  METAGENassist: a comprehensive web server for comparative metagenomics , 2012, Nucleic Acids Res..

[10]  Donald L. Smith,et al.  Enhanced Soybean Plant Growth Resulting from Coinoculation of Bacillus Strains with Bradyrhizobium japonicum , 2003 .

[11]  F. Bäckhed,et al.  Host-Bacterial Mutualism in the Human Intestine , 2005, Science.

[12]  Judea Pearl,et al.  Chapter 2 – BAYESIAN INFERENCE , 1988 .

[13]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[14]  J. Handelsman,et al.  Metagenomics: genomic analysis of microbial communities. , 2004, Annual review of genetics.

[15]  Cory J. Butz,et al.  On learning the structure of sum-product networks , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[16]  Pedro M. Domingos,et al.  Learning the Structure of Sum-Product Networks , 2013, ICML.

[17]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[18]  Neil Stollman,et al.  Long-Term Follow-Up of Colonoscopic Fecal Microbiota Transplant for Recurrent Clostridium difficile Infection , 2012, The American Journal of Gastroenterology.

[19]  B. Woolf,et al.  THE LOG LIKELIHOOD RATIO TEST (THE G‐TEST) , 1957, Annals of human genetics.

[20]  J. Strap,et al.  Novel Plant-Microbe Rhizosphere Interaction Involving Streptomyces lydicus WYEC108 and the Pea Plant (Pisum sativum) , 2002, Applied and Environmental Microbiology.

[21]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[22]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .