Ontology-aware deep learning enables ultrafast and interpretable source tracking among sub-million microbial community samples from hundreds of niches

The taxonomical structure of microbial community sample is highly habitat-specific, making it possible for source tracking niches where samples are originated. Current methods face challenges when the number of samples and niches are magnitudes more than current in use, under which circumstances they are unable to accurately source track samples in a timely manner, rendering them difficult in knowledge discovery from sub-million heterogeneous samples. Here, we introduce a deep learning method based on Ontology-aware Neural Network approach, ONN4MST (https://github.com/HUST-NingKang-Lab/ONN4MST), which takes into consideration the ontology structure of niches and the relationship of samples from these ontologically-organized niches. ONN4MST’s superiority in accuracy, speed and robustness have been proven, for example with an accuracy of 0.99 and AUC of 0.97 in a microbial source tracking experiment that 125,823 samples and 114 niches were involved. Moreover, ONN4MST has been utilized on several source tracking applications, showing that it could provide highly-interpretable results from samples with previously less-studied niches, detect microbial contaminants, and identify similar samples from ontologically-remote niches, with high fidelity.

[1]  Pei-Ying Hong,et al.  Assessing the Groundwater Quality at a Saudi Arabian Agricultural Site and the Occurrence of Opportunistic Pathogens on Irrigated Food Produce , 2015, International journal of environmental research and public health.

[2]  Amnon Amir,et al.  Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer , 2016, Nature Medicine.

[3]  Eran Halperin,et al.  FEAST: fast expectation-maximization for microbial source tracking , 2019, Nature Methods.

[4]  Salvador Lladó,et al.  Drivers of microbial community structure in forest soils , 2018, Applied Microbiology and Biotechnology.

[5]  Rick L. Stevens,et al.  A communal catalogue reveals Earth’s multiscale microbial diversity , 2017, Nature.

[6]  Jennifer M. Fettweis,et al.  The Integrative Human Microbiome Project , 2019, Nature.

[7]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[8]  James M Mullin,et al.  The Host Microbiome Regulates and Maintains Human Health: A Primer and Perspective for Non-Microbiologists. , 2017, Cancer research.

[9]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[10]  Yan Wang,et al.  Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families , 2019, Genome Biology.

[11]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[12]  Robert D. Finn,et al.  MGnify: the microbiome analysis resource in 2020 , 2019, Nucleic Acids Res..

[13]  S. Hird,et al.  Spatial heterogeneity of the shorebird gastrointestinal microbiome , 2020, Royal Society Open Science.

[14]  Kenneth Timmis,et al.  Microbiome Yarns: microbiome of the built environment, paranormal microbiology, and the power of single cell genomics1,2,3,4 , 2018, Microbial biotechnology.

[15]  Mutsunori Tokeshi,et al.  Species Abundance Patterns and Community Structure , 1993 .

[16]  Rob Knight,et al.  Bayesian community-wide culture-independent microbial source tracking , 2011, Nature Methods.

[17]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[18]  Blair Sterba-Boatwright,et al.  Novel application of a statistical technique, Random Forests, in a bacterial source tracking study. , 2010, Water research.

[19]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[20]  J. M. Simpson,et al.  Microbial source tracking: state of the science. , 2002, Environmental science & technology.

[21]  Gregory B. Gloor,et al.  The Gut Microbiota of Healthy Aged Chinese Is Similar to That of the Healthy Young , 2017, mSphere.

[22]  P. O’Toole,et al.  Composition and temporal stability of the gut microbiota in older persons , 2015, The ISME Journal.

[23]  P. Brigidi,et al.  Through Ageing, and Beyond: Gut Microbiota and Inflammatory Status in Seniors and Centenarians , 2010, PloS one.

[24]  Rob Knight,et al.  Longitudinal analysis of microbial interaction between humans and the indoor environment , 2014, Science.

[25]  Kang Ning,et al.  Meta-Prism: Ultra-fast and highly accurate microbial community structure search utilizing dual indexing and parallel computation , 2020, Briefings Bioinform..

[26]  Rob Knight,et al.  The Earth Microbiome project: successes and aspirations , 2014, BMC Biology.