Parsimonious and Efficient Likelihood Composition by Gibbs Sampling

The traditional maximum likelihood estimator (MLE) is often of limited use in complex high-dimensional data due to the intractability of the underlying likelihood function. Maximum composite likelihood estimation (McLE) avoids full likelihood specification by combining a number of partial likelihood objects depending on small data subsets, thus enabling inference for complex data. A fundamental difficulty in making the McLE approach practicable is the selection from numerous candidate likelihood objects for constructing the composite likelihood function. In this article, we propose a flexible Gibbs sampling scheme for optimal selection of sub-likelihood components. The sampled composite likelihood functions are shown to converge to the one maximally informative on the unknown parameters in equilibrium, since sub-likelihood objects are chosen with probability depending on the variance of the corresponding McLE. A penalized version of our method generates sparse likelihoods with a relatively small number of components when the data complexity is intense. Our algorithms are illustrated through numerical examples on simulated data as well as real genotype single nucleotide polymorphism (SNP) data from a case–control study.

[1]  Xindong Zhao,et al.  On time series model selection involving many candidate ARMA models , 2007, Comput. Stat. Data Anal..

[2]  P. Hall,et al.  On blocking rules for the bootstrap with dependent data , 1995 .

[3]  Ulrich Amsel,et al.  Quasi Likelihood And Its Application A General Approach To Optimal Parameter Estimation , 2016 .

[4]  Guoqi Qian,et al.  Using MCMC for Logistic Regression Model Selection Involving Large Number of Candidate Models , 2002 .

[5]  J. Shao One-step jackknife for M-estimators computed using Newton's method , 1992 .

[6]  Cedric Gondro,et al.  Quality control for genome-wide association studies. , 2013, Methods in molecular biology.

[7]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[8]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[9]  Harry Joe,et al.  Composite Likelihood Methods , 2012 .

[10]  S. Brooks,et al.  Classical model selection via simulated annealing , 2003, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[11]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[12]  Wei Pan,et al.  A composite likelihood approach to latent multivariate Gaussian modeling of SNP data with application to genetic association testing. , 2012, Biometrics.

[13]  Nils Lid Hjort,et al.  Model Selection and Model Averaging , 2001 .

[14]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[15]  D. Cox,et al.  A note on pseudolikelihood constructed from marginal densities , 2004 .

[16]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[17]  Nils Lid Hjort,et al.  Model Selection and Model Averaging: Contents , 2008 .

[18]  N. Hjort,et al.  Comprar Model Selection and Model Averaging | Gerda Claeskens | 9780521852258 | Cambridge University Press , 2008 .

[19]  John L Hopper,et al.  Familial risks, early-onset breast cancer, and BRCA1 and BRCA2 germline mutations. , 2003, Journal of the National Cancer Institute.

[20]  D. English,et al.  Common genetic variants associated with breast cancer and mammographic density measures that predict disease. , 2010, Cancer research.

[21]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[22]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[23]  Bruce G. Lindsay,et al.  ISSUES AND STRATEGIES IN THE SELECTION OF COMPOSITE LIKELIHOODS , 2011 .

[24]  Elizaveta Levina,et al.  Discussion of "Stability selection" by N. Meinshausen and P. Buhlmann , 2010 .

[25]  Guoqi Qian,et al.  Computations and analysis in robust regression model selection using stochastic complexity , 1999, Comput. Stat..