A variable selection criterion for linear discriminant rule and its optimality in high dimensional and large sample data

In this paper, we suggest the new variable selection procedure, called MEC, for linear discriminant rule in the high dimensional and large sample setup. MEC is derived as a second-order unbiased estimator of the misclassification error probability of the linear discriminant rule (LDR). It is shown that MEC not only asymptotically decomposes into 'fitting' and 'penalty' terms like AIC and Mallows C"p, but also possesses an asymptotic optimality in the sense that MEC achieves the smallest possible conditional probability of misclassification in candidate variable sets. Through simulation studies, it is shown that MEC has good performances in the sense of selecting the true variable sets.

[1]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[2]  Calyampudi R. Rao,et al.  Linear statistical inference and its applications , 1965 .

[3]  Ker-Chau Li,et al.  Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[4]  Calyampudi R. Rao,et al.  Tests of significance in multivariate analysis. , 1948, Biometrika.

[5]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[6]  Geoffrey J. McLachlan,et al.  Selection of Variables in Discriminant-Analysis , 1980 .

[7]  Y. Fujikoshi Selection of variables in two-group discriminant analysis by error rate and Akaike's information criteria , 1985 .

[8]  Yasunori Fujikoshi,et al.  Error Bounds for Asymptotic Approximations of the Linear Discriminant Function When the Sample Sizes and Dimensionality are Large , 2000 .

[9]  R W Doerge,et al.  Variable Selection in High‐Dimensional Multivariate Binary Data with Application to the Analysis of Microbial Community DNA Fingerprints , 2002, Biometrics.

[10]  Y. Fujikoshi,et al.  High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis , 2012 .

[11]  Masashi Hyodo,et al.  Asymptotic expansion and estimation of EPMC for linear classification rules in high dimension , 2013, J. Multivar. Anal..

[12]  Y. Fujikoshi Selection of variables for discriminant analysis in a high-dimensional case , 2002 .

[13]  F. J. Wyman,et al.  A comparison of asymptotic error rate expansions for the sample linear discriminant function , 1990, Pattern Recognit..

[14]  Geoffrey J. McLachlan,et al.  Criterion for Selecting Variables for Linear Discriminant Function , 1976 .

[15]  Minoru Siotani,et al.  3 Large sample approximations and asymptotic expansions of classification statistics , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.