A small‐sample kernel association test for correlated data with application to microbiome association studies

Recent research has highlighted the importance of the human microbiome in many human disease and health conditions. Most current microbiome association analyses focus on unrelated samples; such methods are not appropriate for analysis of data collected from more advanced study designs such as longitudinal and pedigree studies, where outcomes can be correlated. Ignoring such correlations can sometimes lead to suboptimal results or even possibly biased conclusions. Thus, new methods to handle correlated outcome data in microbiome association studies are needed. In this paper, we propose the correlated sequence kernel association test (CSKAT) to address such correlations using the linear mixed model. Specifically, random effects are used to account for the outcome correlations and a variance component test is used to examine the microbiome effect. Compared to existing genetic association tests for longitudinal and family samples, we implement a correction procedure to better calibrate the null distribution of the score test statistic to accommodate the small sample size nature of data collected from a typical microbiome study. Comprehensive simulation studies are conducted to demonstrate the validity and efficiency of our method, and we show that CSKAT achieves a higher power than existing methods while correctly controlling the Type I error rate. We also apply our method to a microbiome data set collected from a UK twin study to illustrate its potential usefulness. A free implementation of our method in R software is available at https://github.com/jchen1981/SSKAT.

[1]  Hongzhe Li,et al.  Associating microbiome composition with environmental covariates using generalized UniFrac distances , 2012, Bioinform..

[2]  J. Meigs,et al.  Sequence Kernel Association Test for Quantitative Traits in Family Samples , 2013, Genetic epidemiology.

[3]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[4]  Liping Zhao,et al.  Structural segregation of gut microbiota between colorectal cancer patients and healthy volunteers , 2011, The ISME Journal.

[5]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[6]  Hongzhe Li,et al.  A general framework for association analysis of microbial communities on a taxonomic tree , 2016, Bioinform..

[7]  Hongzhe Li,et al.  Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test. , 2015, American journal of human genetics.

[8]  Hongzhe Li Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis , 2015 .

[9]  Daniel J Schaid,et al.  Multiple Genetic Variant Association Testing by Collapsing and Kernel Methods With Pedigree or Population Structured Data , 2013, Genetic epidemiology.

[10]  Andrey Ziyatdinov,et al.  lme4qtl: linear mixed models with flexible covariance structure for genetic studies of related individuals , 2017, bioRxiv.

[11]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[12]  Jun Chen,et al.  False discovery rate control incorporating phylogenetic tree increases detection power in microbiome‐wide multiple testing , 2017, Bioinform..

[13]  M. Blaser,et al.  A powerful microbiome-based association test and a microbial taxa discovery framework for comprehensive association mapping , 2017, Microbiome.

[14]  Ke Xu,et al.  Longitudinal SNP‐set association analysis of quantitative phenotypes , 2017, Genetic epidemiology.

[15]  Xiang Zhan,et al.  A novel copy number variants kernel association test with application to autism spectrum disorders studies , 2016, Bioinform..

[16]  Angela C. Poole,et al.  Human Genetics Shape the Gut Microbiome , 2014, Cell.

[17]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[18]  Christine Wendt,et al.  Longitudinal analysis of the lung microbiome in lung transplantation. , 2013, FEMS microbiology letters.

[19]  Hongzhe Li,et al.  Disordered Microbial Communities in the Upper Respiratory Tract of Cigarette Smokers , 2010, PloS one.

[20]  Bradley Efron,et al.  False Discovery Rate Control , 2010 .

[21]  Alan Agresti,et al.  Random effect models for repeated measures of zero-inflated count data , 2005 .

[22]  Huilin Li,et al.  A multivariate distance‐based analytic framework for microbial interdependence association test in longitudinal study , 2017, Genetic epidemiology.

[23]  Rob Knight,et al.  Longitudinal analysis of microbial interaction between humans and the indoor environment , 2014, Science.

[24]  A. Clark The Human Microbiome. , 2017, The American journal of nursing.

[25]  John A. Todd,et al.  Metagenomics and Personalized Medicine , 2011, Cell.

[26]  Guanhua Chen,et al.  PERMANOVA-S: association test for microbial community composition that accommodates confounders and multiple distances , 2016, Bioinform..

[27]  Jun Chen,et al.  Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies , 2016, Genetic epidemiology.

[28]  Mihai Pop,et al.  Longitudinal analysis of the lung microbiota of cynomolgous macaques during long-term SHIV infection , 2016, Microbiome.

[29]  W. Pan,et al.  An adaptive association test for microbiome data , 2016, Genome Medicine.

[30]  Xiang Zhan,et al.  Vaginal microbiota and genitourinary menopausal symptoms: a cross-sectional analysis , 2017, Menopause.

[31]  Christian Gieger,et al.  RL-SKAT: An Exact and Efficient Score Test for Heritability and Set Tests , 2017, Genetics.

[32]  Xiang Zhan,et al.  Generalized Hotelling's test for paired compositional data with application to human microbiome studies , 2018, Genetic epidemiology.

[33]  D. Sommers,et al.  A longitudinal analysis , 1992 .

[34]  Xiang Zhan,et al.  MiRKAT-S: a community-level test of association between the microbiota and survival times , 2017, Microbiome.

[35]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[36]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[37]  Min A. Jhun,et al.  SNP Set Association Analysis for Familial Data , 2012, Genetic epidemiology.

[38]  Xiang Zhan,et al.  A small‐sample multivariate kernel machine test for microbiome association studies , 2017, Genetic epidemiology.

[39]  Brian H. McArdle,et al.  FITTING MULTIVARIATE MODELS TO COMMUNITY DATA: A COMMENT ON DISTANCE‐BASED REDUNDANCY ANALYSIS , 2001 .

[40]  Timothy L. Tickle,et al.  Associations between host gene expression, the mucosal microbiome, and clinical outcome in the pelvic pouch of patients with inflammatory bowel disease , 2015, Genome Biology.

[41]  Pierre Lafaye de Micheaux,et al.  Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods , 2010, Comput. Stat. Data Anal..

[42]  Xiang Zhan,et al.  A fast small‐sample kernel independence test for microbiome community‐level association analysis , 2017, Biometrics.

[43]  R. Lasken Genomic sequencing of uncultured microorganisms from single cells , 2012, Nature Reviews Microbiology.

[44]  Hongzhe Li,et al.  A two-part mixed-effects model for analyzing longitudinal microbiome compositional data , 2016, Bioinform..