ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries

Identification of significant features is a critical task in microbiome studies that is complicated by the fact that microbial data are high dimensional and heterogeneous. Masked by the complexity of the data, the problem of separating signal from noise becomes challenging and troublesome. For instance, when performing differential abundance tests, multiple testing adjustments tend to be overconservative, as the probability of a type I error (false positive) increases dramatically with the large numbers of hypotheses. We represent the significance identification problem as a dynamic process of separating signals from a randomized background. The signals and noises in this process will converge from fully mixing to clearly separating, if the original data is differential by the grouping factor. We propose the progressive permutation method to achieve this process and show the converging trend. The proposed method progressively permutes the grouping factor labels of microbiome and performs multiple differential abundance tests in each scenario. We compare the signal strength of top hits from the original data with their performance in permutations, and will observe an apparent decreasing trend if these top hits are true positives identified from the data. To help understand the robustness of the discoveries and identify best hits, we develop a user-friendly and efficient RShiny tool. Simulations and applications on real data show that the proposed method can evaluate the overall association between microbiome and the grouping factor, rank the robustness of the discovered microbes, and list the discoveries, their effect sizes, and individual abundances.

[1]  Luc Bijnens,et al.  A broken promise: microbiome differential abundance methods do not control the false discovery rate , 2019, Briefings Bioinform..

[2]  John-Paul J. Yu,et al.  Gut microbiome populations are associated with structure-specific changes in white matter architecture , 2018, Translational Psychiatry.

[3]  Andrew Burke,et al.  The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index. , 2014, Journal of clinical epidemiology.

[4]  Christine B. Peterson,et al.  Tumor Microbiome Diversity and Composition Influence Pancreatic Cancer Outcomes , 2019, Cell.

[5]  Amnon Amir,et al.  Discrete False-Discovery Rate Improves Identification of Differentially Abundant Microbes , 2017, mSystems.

[6]  Sterling C. Johnson,et al.  Gut microbiome alterations in Alzheimer’s disease , 2017, Scientific Reports.

[7]  A R Feinstein,et al.  The unit fragility index: an additional appraisal of "statistical significance" for a contrast of two proportions. , 1990, Journal of clinical epidemiology.

[8]  C. Huttenhower,et al.  Metagenomic biomarker discovery and explanation , 2011, Genome Biology.

[9]  Patrice D Cani,et al.  Gut microbiota-mediated inflammation in obesity: a link with gastrointestinal cancer , 2018, Nature Reviews Gastroenterology & Hepatology.

[10]  Luke R. Thompson,et al.  Best practices for analysing microbiomes , 2018, Nature Reviews Microbiology.

[11]  M. Pop,et al.  Robust methods for differential abundance analysis in marker gene surveys , 2013, Nature Methods.

[12]  B. Helmink,et al.  The Influence of the Gut Microbiome on Cancer, Immunity, and Cancer Immunotherapy. , 2018, Cancer cell.

[13]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[14]  Ekaterina Smirnova,et al.  PERFect: PERmutation Filtering test for microbiome data. , 2018, Biostatistics.

[15]  Rob Knight,et al.  Analysis of composition of microbiomes: a novel method for studying microbial composition , 2015, Microbial ecology in health and disease.

[16]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[17]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[18]  Xun Xu,et al.  The gut microbiome in atherosclerotic cardiovascular disease , 2017, Nature Communications.

[19]  S. Massart,et al.  Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa , 2010, Proceedings of the National Academy of Sciences.

[20]  Jelle J. Goeman,et al.  Multiple hypothesis testing in genomics , 2014, Statistics in medicine.

[21]  Rob Knight,et al.  Seasonal cycling in the gut microbiome of the Hadza hunter-gatherers of Tanzania , 2017, Science.

[22]  Jean M. Macklaim,et al.  Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis , 2014, Microbiome.

[23]  Xiexin Tang improves the symptom of type 2 diabetic rats by modulation of the gut microbiota , 2018, Scientific Reports.