Recovering Independent Associations in Genetics: A Comparison

In genetics, it is often of interest to discover single nucleotide polymorphisms (SNPs) that are directly related to a disease, rather than just being associated with it. Few methods exist, however, for addressing this so-called "true sparsity recovery" issue. In a thorough simulation study, we show that for moderate or low correlation between predictors, lasso-based methods perform well at true sparsity recovery, despite not being specifically designed for this purpose. For large correlations, however, more specialized methods are needed. Stability selection and direct effect testing perform well in all situations, including when the correlation is large.

[1]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[2]  P. Fryzlewicz,et al.  High dimensional variable selection via tilting , 2012, 1611.08640.

[3]  T Jaki,et al.  Direct effects testing: A two‐stage procedure to test for effect size and variable importance for correlated binary predictors and a binary response , 2010, Statistics in medicine.

[4]  N. Meinshausen Hierarchical testing of variable importance , 2008 .

[5]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[6]  Clive E. Bowman,et al.  Genetic variations in HLA-B region and hypersensitivity reactions to abacavir , 2002, The Lancet.

[7]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[8]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[9]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[10]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  John Whitehead,et al.  Sequential genome‐wide association studies for monitoring adverse events in the clinical evaluation of new drugs , 2006, Statistics in medicine.

[13]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[14]  Elizaveta Levina,et al.  Discussion of "Stability selection" by N. Meinshausen and P. Buhlmann , 2010 .

[15]  M. Stephens,et al.  Bayesian statistical methods for genetic association studies , 2009, Nature Reviews Genetics.

[16]  Gareth M. James,et al.  Forward-LASSO with Adaptive Shrinkage , 2009 .

[17]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[18]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[19]  P. McCullagh,et al.  Generalized Linear Models , 1984 .