Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA

MOTIVATION The variation in community composition between microbiome samples, termed beta diversity, can be measured by pairwise distance based on either presence-absence or quantitative species abundance data. PERMANOVA, a permutation-based extension of multivariate analysis of variance to a matrix of pairwise distances, partitions within-group and between-group distances to permit assessment of the effect of an exposure or intervention (grouping factor) upon the sampled microbiome. Within-group distance and exposure/intervention effect size must be accurately modeled to estimate statistical power for a microbiome study that will be analyzed with pairwise distances and PERMANOVA. RESULTS We present a framework for PERMANOVA power estimation tailored to marker-gene microbiome studies that will be analyzed by pairwise distances, which includes: (i) a novel method for distance matrix simulation that permits modeling of within-group pairwise distances according to pre-specified population parameters; (ii) a method to incorporate effects of different sizes within the simulated distance matrix; (iii) a simulation-based method for estimating PERMANOVA power from simulated distance matrices; and (iv) an R statistical software package that implements the above. Matrices of pairwise distances can be efficiently simulated to satisfy the triangle inequality and incorporate group-level effects, which are quantified by the adjusted coefficient of determination, omega-squared (ω2). From simulated distance matrices, available PERMANOVA power or necessary sample size can be estimated for a planned microbiome study.

[1]  F. Bushman,et al.  Linking Long-Term Dietary Patterns with Gut Microbial Enterotypes , 2011, Science.

[2]  R. Knight,et al.  Quantitative and Qualitative β Diversity Measures Lead to Different Insights into Factors That Structure Microbial Communities , 2007, Applied and Environmental Microbiology.

[3]  Susan Holmes,et al.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data , 2013, PloS one.

[4]  Robert K. Colwell,et al.  A new statistical approach for assessing similarity of species composition with incidence and abundance data , 2004 .

[5]  References , 1971 .

[6]  Jun Zhu,et al.  Succession in the Gut Microbiome following Antibiotic and Antibody Therapies for Clostridium difficile , 2012, PloS one.

[7]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[8]  David J. Edwards,et al.  Hypothesis Testing and Power Calculations for Taxonomic-Based Human Microbiome Data , 2012, PloS one.

[9]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  J. Algina,et al.  Generalized eta and omega squared statistics: measures of effect size for some common research designs. , 2003, Psychological methods.

[12]  Marti J. Anderson,et al.  A new method for non-parametric multivariate analysis of variance in ecology , 2001 .

[13]  R. Knight,et al.  UniFrac: an effective distance metric for microbial community comparison , 2011, The ISME Journal.

[14]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[15]  E. Paradis Analysis of Phylogenetics and Evolution with R , 2006 .

[16]  Kyle Bittinger,et al.  Lung-enriched organisms and aberrant bacterial and fungal respiratory microbiota after lung transplant. , 2012, American journal of respiratory and critical care medicine.

[17]  Katherine H. Huang,et al.  A framework for human microbiome research , 2012, Nature.

[18]  P. Schloss,et al.  Dynamics and associations of microbial community types across the human body , 2014, Nature.

[19]  Jessica J Hellmann,et al.  The application of rarefaction techniques to molecular inventories of microbial diversity. , 2005, Methods in enzymology.

[20]  Brian H. McArdle,et al.  FITTING MULTIVARIATE MODELS TO COMMUNITY DATA: A COMMENT ON DISTANCE‐BASED REDUNDANCY ANALYSIS , 2001 .

[21]  Hongzhe Li,et al.  Kernel Methods for Regression Analysis of Microbiome Compositional Data , 2013 .

[22]  Hongzhe Li,et al.  Disordered Microbial Communities In The Upper Respiratory Tract Of Cigarette Smokers , 2011, ATS 2011.

[23]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[24]  Christina S Leslie,et al.  Computational searches for splicing signals. , 2005, Methods.

[25]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.