Parsimonious and powerful composite likelihood testing for group difference and genotype-phenotype association

Studying the association between a phenotype and a number of genetic variants from case-control data is an important goal in many genetic studies. Association analysis is often carried out by testing the null hypothesis that two groups of multi-dimensional data are generated by the same population. Testing based on genotype data is a challenging task as the full likelihood of the data is usually intractable. This difficulty may be tackled by composite likelihood (MCL) tests which do not entail the full likelihood. But currently available MCL tests are subject to severe power loss for involving non-informative or redundant sub-likelihoods. To reduce the power loss, a forward search and test method for simultaneous powerful group difference testing and informative sub-likelihoods composition is developed. The new method constructs a sequence of Wald-type test statistics by including only informative sub-likelihoods progressively so as to improve the test power under local sparsity alternatives. Numerical studies show it achieves considerable improvement over the available tests as the modeling complexity grows. The new method is illustrated through an analysis of genotype data from a case-control study on breast cancer.

[1]  Carlo Gaetan,et al.  Composite likelihood methods for space-time data , 2006 .

[2]  J. Horowitz Chapter 52 The Bootstrap , 2001 .

[3]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[4]  Wei Pan,et al.  A composite likelihood approach to latent multivariate Gaussian modeling of SNP data with application to genetic association testing. , 2012, Biometrics.

[5]  E. Lehmann,et al.  Testing Statistical Hypothesis. , 1960 .

[6]  D. English,et al.  Common genetic variants associated with breast cancer and mammographic density measures that predict disease. , 2010, Cancer research.

[7]  Khursheed Alam,et al.  Distribution of a Sum of Order Statistics , 1979 .

[8]  P. Song,et al.  Composite Likelihood Bayesian Information Criteria for Model Selection in High-Dimensional Data , 2010 .

[9]  G. Molenberghs,et al.  Pseudolikelihood Modeling of Multivariate Outcomes in Developmental Toxicology , 1999 .

[10]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[11]  G. Giles,et al.  Risk factors for breast cancer in young women by oestrogen receptor and progesterone receptor status , 2003, British Journal of Cancer.

[12]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[13]  Bruce G. Lindsay,et al.  ISSUES AND STRATEGIES IN THE SELECTION OF COMPOSITE LIKELIHOODS , 2011 .

[14]  Jin-Ting Zhang Approximate and Asymptotic Distributions of Chi-Squared–Type Mixtures With Applications , 2005 .

[15]  V. P. Godambe An Optimum Property of Regular Maximum Likelihood Estimation , 1960 .

[16]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[17]  C. Varin,et al.  A note on composite likelihood inference and model selection , 2005 .

[18]  Wei Pan,et al.  Asymptotic tests of association with multiple SNPs in linkage disequilibrium , 2009, Genetic epidemiology.

[19]  G. Molenberghs,et al.  Models for Discrete Longitudinal Data , 2005 .