Feature Selection using Partial Least Squares regression and optimal experiment design

We propose a supervised feature selection technique called the Optimal Loadings, that is based on applying the theory of Optimal Experiment Design (OED) to Partial Least Squares (PLS) regression. We apply the OED criterions to PLS with the goal of selecting an optimal feature subset that minimizes the variance of the regression model and hence minimize its prediction error. We show that the variance of the PLS model can be minimized by employing the OED criterions on the loadings covariance matrix obtained from PLS. We also provide an intuitive viewpoint to the technique by deriving the Aoptimality version of the Optimal Loadings criterion using the properties of maximum relevance and minimum redundancy for PLS models. In our experiments we use the D-optimality version of the criterion which maximizes the determinant of the loadings covariance matrix. To overcome the computational challenges in this criterion, we provide an approximate D-optimality criterion along with the theoretical justification.

[1]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[2]  R. Teófilo,et al.  Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression , 2009 .

[3]  Kim-Chuan Toh,et al.  Solving semidefinite-quadratic-linear programs using SDPT3 , 2003, Math. Program..

[4]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[6]  F. Pukelsheim Optimal Design of Experiments (Classics in Applied Mathematics) (Classics in Applied Mathematics, 50) , 2006 .

[7]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[8]  N. N. Chan A-Optimality for Regression Designs. , 1982 .

[9]  Xiaofei He,et al.  Laplacian Regularized D-Optimal Design for Active Learning and Its Application to Image Retrieval , 2010, IEEE Transactions on Image Processing.

[10]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[11]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[12]  R. C. St. John,et al.  D-Optimality for Regression Designs: A Review , 1975 .

[13]  Stephen P. Boyd,et al.  Determinant Maximization with Linear Matrix Inequality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[14]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[15]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[16]  Panos M. Pardalos,et al.  Optimization Theory: Recent Developments from Mátraháza , 2011 .

[17]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  R. D. Cook,et al.  A Comparison of Algorithms for Constructing Exact D-Optimal Designs , 1980 .

[20]  Michael Jackson,et al.  Optimal Design of Experiments , 1994 .

[21]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[22]  Philippe Besse,et al.  Statistical Applications in Genetics and Molecular Biology A Sparse PLS for Variable Selection when Integrating Omics Data , 2011 .

[23]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[24]  Anne-Laure Boulesteix,et al.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..

[25]  Hujun Bao,et al.  A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[27]  Yuxiao Hu,et al.  Learning a Spatially Smooth Subspace for Face Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Dmitrij Frishman,et al.  Pitfalls of supervised feature selection , 2009, Bioinform..

[29]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.