Multiple-Disease Detection and Classification across Cohorts via Microbiome Search

Here, we present a search-based strategy for disease detection and classification, which detects diseased samples via their outlier novelty versus a database of samples from healthy subjects and then compares them to databases of samples from patients. This approach enables the identification of microbiome states associated with disease even in the presence of different cohorts, multiple sequencing platforms, or significant contamination. ABSTRACT Microbiome-based disease classification depends on well-validated disease-specific models or a priori organismal markers. These are lacking for many diseases. Here, we present an alternative, search-based strategy for disease detection and classification, which detects diseased samples via their outlier novelty versus a database of samples from healthy subjects and then compares these to databases of samples from patients. Our strategy’s precision, sensitivity, and speed outperform model-based approaches. In addition, it is more robust to platform heterogeneity and to contamination in 16S rRNA gene amplicon data sets. This search-based strategy shows promise as an important first step in microbiome big-data-based diagnosis. IMPORTANCE Here, we present a search-based strategy for disease detection and classification, which detects diseased samples via their outlier novelty versus a database of samples from healthy subjects and then compares them to databases of samples from patients. This approach enables the identification of microbiome states associated with disease even in the presence of different cohorts, multiple sequencing platforms, or significant contamination.

[1]  Jesse R. Zaneveld,et al.  Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences , 2013, Nature Biotechnology.

[2]  Matthew J. Gebert,et al.  Alterations in the gut microbiota associated with HIV-1 infection. , 2013, Cell host & microbe.

[3]  A. Butte,et al.  The Integrative Human Microbiome Project: Dynamic Analysis of Microbiome-Host Omics Profiles during Periods of Human Health and Disease , 2014, Cell host & microbe.

[4]  Eric P. Nawrocki,et al.  An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea , 2011, The ISME Journal.

[5]  Se Jin Song,et al.  The treatment-naive microbiome in new-onset Crohn's disease. , 2014, Cell host & microbe.

[6]  Rob Knight,et al.  Which is more important for classifying microbial communities: who's there or what they can do? , 2014, The ISME Journal.

[7]  Rob Knight,et al.  American Gut: an Open Platform for Citizen Science Microbiome Research , 2018, mSystems.

[8]  Amnon Amir,et al.  Prediction of Early Childhood Caries via Spatial-Temporal Variations of Oral Microbiota. , 2015, Cell host & microbe.

[9]  Jian Xu,et al.  Meta-Storms: efficient search for similar microbial communities based on a novel indexing scheme and similarity score for metagenomic data , 2012, Bioinform..

[10]  Rafael A. Irizarry,et al.  Meta-analysis of gut microbiome studies identifies disease-specific and shared responses , 2017, Nature Communications.

[11]  Tiphaine Martin,et al.  Gut microbiota associations with common diseases and prescription medications in a population-based cohort , 2018, Nature Communications.

[12]  Rob Knight,et al.  Guiding longitudinal sampling in IBD cohorts , 2017, Gut.

[13]  Mingxun Wang,et al.  Qiita: rapid, web-enabled microbiome meta-analysis , 2018, Nature Methods.

[14]  Se Jin Song,et al.  Tiny microbes, enormous impacts: what matters in gut microbiome studies? , 2016, Genome Biology.

[15]  James T. Morton,et al.  Microbiome-wide association studies link dynamic microbial consortia to disease , 2016, Nature.

[16]  Zheng Sun,et al.  Identifying and Predicting Novelty in Microbiome Studies , 2018, mBio.

[17]  John G Kenny,et al.  A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling , 2016, BMC Genomics.

[18]  Kang Ning,et al.  GPU-Meta-Storms: computing the structure similarities among massive amount of microbial community samples using GPU , 2014, Bioinform..

[19]  Kang Ning,et al.  Parallel-META 3: Comprehensive taxonomical and functional analysis platform for efficient comparison of microbial communities , 2017, Scientific Reports.

[20]  Peter Williams,et al.  IMG: the integrated microbial genomes database and comparative analysis system , 2011, Nucleic Acids Res..

[21]  Alexander Statnikov,et al.  A comprehensive evaluation of multicategory classification methods for microbiomic data , 2013, Microbiome.