Group-combined P-values with applications to genetic association studies

MOTIVATION In large-scale genetic association studies with tens of hundreds of single nucleotide polymorphisms (SNPs) genotyped, the traditional statistical framework of logistic regression using maximum likelihood estimator (MLE) to infer the odds ratios of SNPs may not work appropriately. This is because a large number of odds ratios need to be estimated, and the MLEs may be not stable when some of the SNPs are in high linkage disequilibrium. Under this situation, the P-value combination procedures seem to provide good alternatives as they are constructed on the basis of single-marker analysis. RESULTS The commonly used P-value combination methods (such as the Fisher's combined test, the truncated product method, the truncated tail strength and the adaptive rank truncated product) may lose power when the significance level varies across SNPs. To tackle this problem, a group combined P-value method (GCP) is proposed, where the P-values are divided into multiple groups and then are combined at the group level. With this strategy, the significance values are integrated at different levels, and the power is improved. Simulation shows that the GCP can effectively control the type I error rates and have additional power over the existing methods-the power increase can be as high as over 50% under some situations. The proposed GCP method is applied to data from the Genetic Analysis Workshop 16. Among all the methods, only the GCP and ARTP can give the significance to identify a genomic region covering gene DSC3 being associated with rheumatoid arthritis, but the GCP provides smaller P-value. AVAILABILITY AND IMPLEMENTATION http://www.statsci.amss.ac.cn/yjscy/yjy/lqz/201510/t20151027_313273.html CONTACT liqz@amss.ac.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[2]  Peter Kraft,et al.  Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. , 2014, American journal of human genetics.

[3]  R. Pfeiffer,et al.  A Powerful Method for Combining P‐Values in Genomic Studies , 2013, Genetic epidemiology.

[4]  R. Tibshirani,et al.  A tail strength measure for assessing the overall univariate significance in a dataset. , 2005, Biostatistics.

[5]  V. Pungpapong,et al.  Case-control genome-wide association study of rheumatoid arthritis from Genetic Analysis Workshop 16 using penalized orthogonal-components regression-linear discriminant analysis , 2009, BMC proceedings.

[6]  S. Rachev Handbook of heavy tailed distributions in finance , 2003 .

[7]  Colin O. Wu,et al.  Joint Analysis of Binary and Quantitative Traits With Data Sharing and Outcome‐Dependent Sampling , 2012, Genetic epidemiology.

[8]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[9]  James T. Elder,et al.  Genome-wide meta-analysis of Psoriatic Arthritis Identifies Susceptibility Locus at REL , 2011, The Journal of investigative dermatology.

[10]  Xihong Lin,et al.  HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION. , 2013, Annals of statistics.

[11]  P. Embrechts,et al.  Chapter 8 – Modelling Dependence with Copulas and Applications to Risk Management , 2003 .

[12]  Gang Zheng,et al.  Fisher's method of combining dependent statistics using generalizations of the gamma distribution with applications to genetic pleiotropic associations. , 2014, Biostatistics.

[13]  Frank Dudbridge,et al.  Rank truncated product of P‐values, with application to genomewide association scans , 2003, Genetic epidemiology.

[14]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[15]  Daniel J Schaid,et al.  Nonparametric tests of association of multiple genes with human disease. , 2005, American journal of human genetics.

[16]  Yijun Zuo,et al.  A powerful truncated tail strength method for testing multiple null hypotheses in one dataset. , 2011, Journal of theoretical biology.

[17]  G. Zheng,et al.  Rank‐Based Robust Tests for Quantitative‐Trait Genetic Association Studies , 2013, Genetic epidemiology.

[18]  R. Christensen Regression Models for Ordinal Data Introducing R-package ordinal , 2011 .

[19]  A. Hess,et al.  Fisher's combined p-value for detecting differentially expressed genes using Affymetrix expression arrays , 2007, BMC Genomics.

[20]  Qizhai Li,et al.  Improved correction for population stratification in genome‐wide association studies by identifying hidden population structures , 2008, Genetic epidemiology.

[21]  Qizhai Li,et al.  Nonparametric Risk and Nonparametric Odds in Quantitative Genetic Association Studies , 2015, Scientific Reports.

[22]  S. Purcell,et al.  Pleiotropy in complex traits: challenges and strategies , 2013, Nature Reviews Genetics.

[23]  Stephen Chanock,et al.  Population Substructure and Control Selection in Genome-Wide Association Studies , 2008, PloS one.

[24]  P. Rosenberg,et al.  Pathway analysis by adaptive combination of P‐values , 2009, Genetic epidemiology.

[25]  B S Weir,et al.  Truncated product method for combining P‐values , 2002, Genetic epidemiology.

[26]  R. Fisher,et al.  Statistical Methods for Research Workers , 1930, Nature.