Gaussian latent variable models for variable selection

Variable selection has been extensively studied in linear regression and classification models. Most of these models assume that the input variables are noise free, the response variables are corrupted by Gaussian noise. In this paper, we discuss the variable selection problem assuming that both input variables and response variables are corrupted by Gaussian noise. We analyze the prediction error when augment one related noise variable. We show that the prediction error always decrease when more variable were employed for prediction when the joint distribution of variables are known. Based on this analysis, in sense of mean square error, the optimal variable selection can be obtained. We found that the results is very different from the matching pursuit algorithm(MP), which is widely used in variable selection problems.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[3]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[4]  S. R. Searle,et al.  On Deriving the Inverse of a Sum of Matrices , 1981 .

[5]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[6]  R. Walls,et al.  A Note on the Variance of a Predicted Response in Regression , 1969 .

[7]  Ahmed M. Elgammal,et al.  MKPLS: Manifold Kernel Partial Least Squares for Lipreading and Speaker Identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Larry S. Davis,et al.  Human detection using partial least squares analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Jonghyun Choi,et al.  Face Identification Using Large Feature Sets , 2012, IEEE Transactions on Image Processing.

[11]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[12]  Rosane Minghim,et al.  Semi‐Supervised Dimensionality Reduction based on Partial Least Squares for Visual Analysis of High Dimensional Data , 2012, Comput. Graph. Forum.

[13]  Allen Y. Yang,et al.  Fast L1-Minimization Algorithms For Robust Face Recognition , 2010, 1007.3753.

[14]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[15]  Wotao Yin,et al.  Bregman Iterative Algorithms for (cid:2) 1 -Minimization with Applications to Compressed Sensing ∗ , 2008 .

[16]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[17]  Pedro Larrañaga,et al.  Feature Subset Selection by Bayesian network-based optimization , 2000, Artif. Intell..

[18]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[19]  Vincent Y. F. Tan,et al.  High-dimensional structure estimation in Ising models: Local separation criterion , 2011, 1107.1736.

[20]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[21]  D. M. Allen Mean Square Error of Prediction as a Criterion for Selecting Variables , 1971 .

[22]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[23]  S. Billings,et al.  Feature Subset Selection and Ranking for Data Dimensionality Reduction , 2007 .

[24]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[25]  D. Jacobs,et al.  Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch , 2011, CVPR 2011.

[26]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .