Structured Ordinary Least Squares: A Sufficient Dimension Reduction approach for regressions with partitioned predictors and heterogeneous units

In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package "sSDR," publicly available on CRAN, includes all procedures necessary to implement the sOLS approach.

[1]  Liping Zhu,et al.  A Review on Dimension Reduction , 2013, International statistical review = Revue internationale de statistique.

[2]  Xiangrong Yin,et al.  Sufficient dimension reduction in multivariate regressions with categorical predictors , 2013, Comput. Stat. Data Anal..

[3]  Bing Li,et al.  Dimension reduction for the conditional mean in regressions with categorical predictors , 2003 .

[4]  Lexin Li,et al.  Exploiting predictor domain information in sufficient dimension reduction , 2009, Comput. Stat. Data Anal..

[5]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[6]  Ker-Chau Li,et al.  Regression Analysis Under Link Violation , 1989 .

[7]  Francesca Chiaromonte,et al.  Segmenting the human genome based on states of neutral genetic divergence , 2013, Proceedings of the National Academy of Sciences.

[8]  Bing Li,et al.  Dimension reduction in regression without matrix inversion , 2007 .

[9]  Alon Y. Halevy,et al.  Data integration and genomic medicine , 2007, J. Biomed. Informatics.

[10]  R. Dennis Cook,et al.  Testing predictor contributions in sufficient dimension reduction , 2004, math/0406520.

[11]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[12]  Prasad A. Naik,et al.  Constrained Inverse Regression for Incorporating Prior Information , 2005 .

[13]  H. Zha,et al.  Contour regression: A general approach to dimension reduction , 2005, math/0508277.

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  R. Weiss,et al.  Using the Bootstrap to Select One of a New Class of Dimension Reduction Methods , 2003 .

[16]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[17]  Ker-Chau Li,et al.  On almost Linearity of Low Dimensional Projections from High Dimensional Data , 1993 .

[18]  B. Li,et al.  On a Projective Resampling Method for Dimension Reduction With Multivariate Responses , 2008 .

[19]  David Gomez-Cabrero,et al.  Data integration in the era of omics: current and future challenges , 2014, BMC Systems Biology.

[20]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[21]  Xiangrong Yin,et al.  Sliced Inverse Regression with Regularizations , 2008, Biometrics.

[22]  Shaoli Wang,et al.  On Directional Regression for Dimension Reduction , 2007 .

[23]  R. Cook,et al.  Sufficient dimension reduction and prediction in regression , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[24]  R. Cook,et al.  Dimension reduction for conditional mean in regression , 2002 .

[25]  S. Weisberg,et al.  Comments on "Sliced inverse regression for dimension reduction" by K. C. Li , 1991 .

[26]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[27]  Wenbin Lu,et al.  Groupwise Dimension Reduction via Envelope Method , 2015, Journal of the American Statistical Association.

[28]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[29]  H. Tong,et al.  Article: 2 , 2002, European Financial Services Law.

[30]  R. Dennis Cook,et al.  Optimal sufficient dimension reduction in regressions with categorical predictors , 2002 .

[31]  Xiaogang Wang,et al.  Multiple-platform data integration method with application to combined analysis of microarray and proteomic data , 2012, BMC Bioinformatics.

[32]  Peng Zeng,et al.  RSIR: regularized sliced inverse regression for motif discovery , 2005, Bioinform..

[33]  Wolfgang Härdle,et al.  Sliced inverse regression for dimension reduction. Comments. Reply , 1991 .

[34]  Bing Li,et al.  Successive direction extraction for estimating the central subspace in a multiple-index regression , 2008 .

[35]  Bing Li,et al.  Groupwise Dimension Reduction , 2010 .

[36]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .