Understanding and Evaluating Sparse Linear Discriminant Analysis: Supplementary File

To begin, we first use B to project all the training data into a low-dimensional discriminant space. Let Σw and μk denote the within-class pooled covariance matrix and the mean of class k respectively for all the projected data. When given a new observation x, we assume that it is draw from a normal distribution with mean given by some μk and covariance given by Σ B w . We then assign x ? to the class δB (x) with maximum posterior probability in the discriminant space. Assuming that each class has the same prior distribution, δk (x) can be computed using δB (x ) = arg min k ( B>x? − μk )> ( Σw )−1 ( B>x? − μk ) .

[1]  R. Tibshirani,et al.  Penalized classification using Fisher's linear discriminant , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[2]  Bhaskar D. Rao,et al.  Latent Variable Bayesian Models for Promoting Sparsity , 2011, IEEE Transactions on Information Theory.