Variable Selection in Nonparametric Classification Via Measurement Error Model Selection Likelihoods

Using the relationships among ridge regression, LASSO estimation, and measurement error attenuation as motivation, a new measurement-error-model-based approach to variable selection is developed. After describing the approach in the familiar context of linear regression, we apply it to the problem of variable selection in nonparametric classification, resulting in a new kernel-based classifier with LASSO-like shrinkage and variable-selection properties. Finite-sample performance of the new classification method is studied via simulation and real data examples, and consistency of the method is studied theoretically. Supplementary materials for the article are available online.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Alexander Kukush,et al.  Measurement Error Models , 2011, International Encyclopedia of Statistical Science.

[3]  Steven N. MacEachern,et al.  Classification via kernel product estimators , 1998 .

[4]  John P. Buonaccorsi,et al.  Measurement Error: Models, Methods, and Applications , 2010 .

[5]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[6]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[7]  T. Cai,et al.  A Direct Estimation Approach to Sparse Linear Discriminant Analysis , 2011, 1107.3442.

[8]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[9]  H. Zou,et al.  A direct approach to sparse discriminant analysis in ultra-high dimensions , 2012 .

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[11]  Q. Lib,et al.  Nonparametric estimation of regression functions with both categorical and continuous data , 2004 .

[12]  J. Jurecková,et al.  Nonparametric Estimate of Regression Coefficients , 1971 .

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[15]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[16]  D. M. Titterington,et al.  Median-Based Classifiers for High-Dimensional Data , 2009 .

[17]  Hao Helen Zhang Variable selection for support vector machines via smoothing spline anova , 2006 .

[18]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[19]  B. Lindsay,et al.  A UNIVERSALLY CONSISTENT MODIFICATION OF MAXIMUM LIKELIHOOD , 2013 .

[20]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[21]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[22]  Debasis Sengupta,et al.  Classification Using Kernel Density Estimates , 2006, Technometrics.

[23]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[24]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[25]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[26]  Qi Li,et al.  Nonparametric Econometrics: Theory and Practice , 2006 .

[27]  Ross L. Prentice,et al.  Binary Regression Using an Extended Beta-Binomial Distribution, with Discussion of Correlation Induced by Covariate Measurement Errors , 1986 .

[28]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.