STATISTICAL METHODS FOR VARIABLE SELECTION IN THE CONTEXT OF HIGH-DIMENSIONAL DATA: LASSO AND EXTENSIONS

With the advance of technology, the collection and storage of data has become routine. Huge amount of data are increasingly produced from biological experiments. the advent of DNA microarray technologies has enabled scientists to measure expressions of tens of thousands of genes simultaneously. Single nucleotide polymorphism (SNP) are being used in genetic association with a wide range of phenotypes, for example, complex diseases. These high-dimensional problems are becoming more and more common. The “large p, small n” problem, in which there are more variables than samples, currently a challenge that many statisticians face. The penalized variable selection method is an effective method to deal with “large p, small n” problem. In particular, The Lasso (least absolute selection and shrinkage operator) proposed by Tibshirani has become an effective method to deal with this type of problem. the Lasso works well for the covariates which can be treated individually. When the covariates are grouped, it does not work well. Elastic net, group lasso, group MCP and group bridge are extensions of the Lasso. Group lasso enforces sparsity at the group level, rather than at the level of the individual covariates. Group bridge, group MCP produces sparse solutions both at the group level and at the level of the individual covariates within a group. Our simulation study shows that the group lasso forces complete grouping, group MCP encourages grouping to a rather slight

[1]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[2]  Zehua Chen,et al.  A Two-Stage Penalized Logistic Regression Approach to Case-Control Genome-Wide Association Studies , 2012 .

[3]  H. Cordell,et al.  SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression , 2010, Genetic epidemiology.

[4]  Hua Zhou,et al.  Association screening of common and rare genetic variants by penalized regression , 2010, Bioinform..

[5]  Ji Zhu,et al.  A ug 2 01 0 Group Variable Selection via a Hierarchical Lasso and Its Oracle Property Nengfeng Zhou Consumer Credit Risk Solutions Bank of America Charlotte , NC 28255 , 2010 .

[6]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[7]  Qiang Yang,et al.  Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso , 2010, BMC Bioinformatics.

[8]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[9]  Jun Xie,et al.  Group Variable Selection Methods and Their Applications in Analysis of Genomic Data , 2010 .

[10]  Jian Huang,et al.  Penalized methods for bi-level variable selection. , 2009, Statistics and its interface.

[11]  Cun-Hui Zhang,et al.  A group bridge approach for variable selection , 2009, Biometrika.

[12]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[13]  C. Hoggart,et al.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies , 2008, PLoS genetics.

[14]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[15]  L. Stefanski,et al.  Approved by: Project Leader Approved by: LCG Project Leader Prepared by: Project Manager Prepared by: LCG Project Manager Reviewed by: Quality Assurance Manager , 2004 .

[16]  C. Phillips Online resources for SNP analysis , 2007, Molecular biotechnology.

[17]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[18]  Jian Huang,et al.  BMC Bioinformatics BioMed Central Methodology article Supervised group Lasso with applications to microarray data , 2007 .

[19]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[20]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[21]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[22]  Li Shen,et al.  Dimension reduction-based penalized logistic regression for cancer classification using microarray data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[24]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[25]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[26]  Jianqing Fan,et al.  Regularization of Wavelet Approximations , 2001 .

[27]  Dean Phillips Foster,et al.  Calibration and Empirical Bayes Variable Selection , 1997 .

[28]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[29]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[30]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .