Estimating a sparse reduction for general regression in high dimensions

Although the concept of sufficient dimension reduction that was originally proposed has been there for a long time, studies in the literature have largely focused on properties of estimators of dimension-reduction subspaces in the classical “small p, and large n” setting. Rather than the subspace, this paper considers directly the set of reduced predictors, which we believe are more relevant for subsequent analyses. A principled method is proposed for estimating a sparse reduction, which is based on a new, revised representation of an existing well-known method called the sliced inverse regression. A fast and efficient algorithm is developed for computing the estimator. The asymptotic behavior of the new method is studied when the number of predictors, p, exceeds the sample size, n, providing a guide for choosing the number of sufficient dimension-reduction predictors. Numerical results, including a simulation study and a cancer-drug-sensitivity data analysis, are presented to examine the performance.

[1]  Jianhua Z. Huang,et al.  Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection , 2012 .

[2]  M. L. Eaton Multivariate statistics : a vector space approach , 1985 .

[3]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[4]  Jian Huang,et al.  Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors , 2012, Statistics and Computing.

[5]  A. Izenman Reduced-rank regression for the multivariate linear model , 1975 .

[6]  R. Dennis Cook,et al.  Using Dimension-Reduction Subspaces to Identify Important Inputs in Models of Physical Systems ∗ , 2009 .

[7]  Liping Zhu,et al.  A Review on Dimension Reduction , 2013, International statistical review = Revue internationale de statistique.

[8]  Bing Li,et al.  Dimension reduction in regression without matrix inversion , 2007 .

[9]  Howard D. Bondell,et al.  Shrinkage inverse regression estimation for model‐free variable selection , 2009 .

[10]  Gail Fraizer,et al.  Transcriptional Regulation of EGR1 by EGF and the ERK Signaling Pathway in Prostate Cancer Cells. , 2011, Genes & cancer.

[11]  Bing Li,et al.  Successive direction extraction for estimating the central subspace in a multiple-index regression , 2008 .

[12]  Xiangrong Yin,et al.  Sequential sufficient dimension reduction for large p, small n problems , 2015 .

[13]  B. McManus,et al.  Coxsackievirus B3 Replication Is Reduced by Inhibition of the Extracellular Signal-Regulated Kinase (ERK) Signaling Pathway , 2002, Journal of Virology.

[14]  Han Liu,et al.  Estimation Consistency of the Group Lasso and its Applications , 2009, AISTATS.

[15]  R. Dennis Cook,et al.  Testing predictor contributions in sufficient dimension reduction , 2004, math/0406520.

[16]  E. Nishida,et al.  ERK induces p35, a neuron-specific activator of Cdk5, through induction of Egr1 , 2001, Nature Cell Biology.

[17]  R. Cook,et al.  Coordinate-independent sparse sufficient dimension reduction and variable selection , 2010, 1211.3215.

[18]  M. Wegkamp,et al.  Joint variable and rank selection for parsimonious estimation of high-dimensional matrices , 2011, 1110.3556.

[19]  R. H. Moore,et al.  Regression Graphics: Ideas for Studying Regressions Through Graphics , 1998, Technometrics.

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  Peng Zeng,et al.  RSIR: regularized sliced inverse regression for motif discovery , 2005, Bioinform..

[22]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[23]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[24]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[25]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[26]  Hanina Hibshoosh,et al.  Abstract 20: PHLDA1/2 contribute to tumor suppression in breast and lung cancer as downstream targets of oncogenic HER2 signaling , 2012 .

[27]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[28]  Tony Pawson,et al.  Temporal regulation of EGF signaling networks by the scaffold protein Shc1 , 2013, Nature.

[29]  V. Buldygin,et al.  Metric characterization of random variables and random processes , 2000 .

[30]  S. Weisberg,et al.  Comments on "Sliced inverse regression for dimension reduction" by K. C. Li , 1991 .

[31]  Zhou Yu,et al.  Dimension reduction and predictor selection in semiparametric models , 2013 .

[32]  Xiangrong Yin Sufficient Dimension Reduction in Regression , 2010 .

[33]  Lexin Li,et al.  ASYMPTOTIC PROPERTIES OF SUFFICIENT DIMENSION REDUCTION WITH A DIVERGING NUMBER OF PREDICTORS. , 2011, Statistica Sinica.

[34]  Shaoli Wang,et al.  On Directional Regression for Dimension Reduction , 2007 .

[35]  King C. P. Li High dimensional data analysis via the sir/phd approach , 2000 .

[36]  Lixing Zhu,et al.  Sufficient dimension reduction through discretization-expectation estimation , 2010 .

[37]  Hongzhe Li,et al.  Dimension reduction methods for microarrays with application to censored survival data , 2004, Bioinform..

[38]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[39]  T. J. Page Multivariate Statistics: A Vector Space Approach , 1984 .

[40]  Xiangrong Yin,et al.  Sliced Inverse Regression with Regularizations , 2008, Biometrics.

[41]  Zhiyong Cheng,et al.  Insulin Receptor Substrates Irs1 and Irs2 Coordinate Skeletal Muscle Growth and Metabolism via the Akt and AMPK Pathways , 2010, Molecular and Cellular Biology.

[42]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[43]  Bo Jiang,et al.  Variable selection for general index models via sliced inverse regression , 2013, 1304.4056.