Variable importance assessment in sliced inverse regression for variable selection

ABSTRACT We are interested in treating the relationship between a dependent variable y and a multivariate covariate in a semiparametric regression model. Since the purpose of most social, biological, or environmental science research is the explanation, the determination of the importance of the variables is a major concern. It is a way to determine which variables are the most important when predicting y. Sliced inverse regression methods allows to reduce the space of the covariate x by estimating the directions β that form an effective dimension reduction (EDR) space. The aim of this article is to propose a computational method based on importance variable measure (only relying on the EDR space) in order to select the most useful variables. The numerical behavior of this new method, implemented in R, is studied on a simulation study. An illustration on a real data is also provided.

[1]  Heng-Hui Lue,et al.  Sliced inverse regression for multivariate response regression , 2009 .

[2]  Lixing Zhu,et al.  ON DIMENSION REDUCTION IN REGRESSIONS WITH MULTIVARIATE RESPONSES , 2010 .

[3]  Thi Mong Ngoc Nguyen,et al.  A new approach on recursive and non-recursive SIR methods , 2012 .

[4]  Jean-Michel Poggi,et al.  VSURF: An R Package for Variable Selection Using Random Forests , 2015, R J..

[5]  Achim Zeileis,et al.  Party on! A new, conditional variable importance measure available in the party package , 2009 .

[6]  Raymond J. Carroll,et al.  An Asymptotic Theory for Sliced Inverse Regression , 1992 .

[7]  Benoit Liquet,et al.  Comparison of sliced inverse regression approaches for underdetermined cases , 2014 .

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Tailen Hsing,et al.  Nearest neighbor inverse regression , 1999 .

[10]  Kerby Shedden,et al.  Dimension Reduction for Multivariate Response Data , 2003 .

[11]  Jérôme Saracco,et al.  An asymptotic theory for sliced inverse regression , 1997 .

[12]  Shaoli Wang,et al.  On Directional Regression for Dimension Reduction , 2007 .

[13]  Ker-Chau Li,et al.  On almost Linearity of Low Dimensional Projections from High Dimensional Data , 1993 .

[14]  S. Velilla,et al.  A bootstrap method for assessing the dimension of a general regression problem , 2007 .

[15]  R. Cook Save: a method for dimension reduction and graphics in regression , 2000 .

[16]  Ker-Chau Li,et al.  Slicing Regression: A Link-Free Regression Method , 1991 .

[17]  Azais Romain,et al.  Optimal quantization applied to Sliced Inverse Regression , 2011, 1101.2121.

[18]  Liping Zhu,et al.  On kernel method for sliced average variance estimation , 2007 .

[19]  Chun-Houh Chen,et al.  CAN SIR BE AS POPULAR AS MULTIPLE LINEAR REGRESSION , 2003 .

[20]  L. Ferré Determining the Dimension in Sliced Inverse Regression and Related Methods , 1998 .

[21]  Jérôme Saracco,et al.  Application of the Bootstrap Approach to the Choice of Dimension and the α Parameter in the SIRα Method , 2008, Commun. Stat. Simul. Comput..

[22]  A. Tsybakov,et al.  Sliced Inverse Regression for Dimension Reduction - Comment , 1991 .

[23]  Ker-Chau Li Sliced inverse regression for dimension reduction (with discussion) , 1991 .

[24]  R C Durfee,et al.  A METHOD OF CLUSTER ANALYSIS. , 1970, Multivariate behavioral research.

[25]  Xiangrong Yin,et al.  ASYMPTOTIC DISTRIBUTIONS FOR DIMENSION REDUCTION IN THE SIR-II METHOD , 2005 .

[26]  J. Saracco,et al.  Optimal quantization applied to sliced inverse regression , 2012 .

[27]  S. Weisberg,et al.  Comments on "Sliced inverse regression for dimension reduction" by K. C. Li , 1991 .

[28]  R. Cook,et al.  Sufficient Dimension Reduction via Inverse Regression , 2005 .

[29]  Jérôme Saracco,et al.  Two Cross Validation Criteria for SIRα and PSIRα methods in view of prediction , 2003, Comput. Stat..

[30]  Achim Zeileis,et al.  A New, Conditional Variable-Importance Measure for Random Forests Available in the party Package , 2009 .

[31]  I E Auger,et al.  Algorithms for the optimal identification of segment neighborhoods. , 1989, Bulletin of mathematical biology.

[32]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[33]  A W EDWARDS,et al.  A METHOD FOR CLUSTER ANALYSIS. , 1965, Biometrics.

[34]  Lixing Zhu,et al.  Asymptotics for sliced average variance estimation , 2007, 0708.0462.

[35]  Xiangrong Yin,et al.  Dimension Reduction via an Alternating Inverse Regression , 2010 .

[36]  U. Grömping Dependence of Variable Importance in Random Forests on the Shape of the Regressor Space , 2009 .

[37]  Jérôme Saracco,et al.  POOLED SLICING METHODS VERSUS SLICING METHODS , 2001 .

[38]  James R. Schott,et al.  Determining the Dimensionality in Sliced Inverse Regression , 1994 .

[39]  Xiangrong Yin,et al.  Sliced Inverse Regression with Regularizations , 2008, Biometrics.

[40]  Maela Kloareg,et al.  R for Statistics , 2012 .

[41]  J. Saracco,et al.  AN ASYMPTOTIC THEORY FOR SIRα METHOD , 2003 .

[42]  Ali Gannoun,et al.  Some extensions of multivariate sliced inverse regression , 2007 .

[43]  Stéphane Girard,et al.  A new sliced inverse regression method for multivariate response , 2013, Comput. Stat. Data Anal..

[44]  Peng Zeng,et al.  RSIR: regularized sliced inverse regression for motif discovery , 2005, Bioinform..

[45]  Lixing Zhu,et al.  Asymptotics of sliced inverse regression , 1995 .

[46]  Paola Zuccolotto,et al.  Variable Selection Using Random Forests , 2006 .

[47]  Idris A. Eckley,et al.  changepoint: An R Package for Changepoint Analysis , 2014 .