Understanding and Evaluating Sparse Linear Discriminant Analysis

Linear discriminant analysis (LDA) represents a simple yet powerful technique for partitioning a p-dimensional feature vector into one of K classes based on a linear projection learned from N labeled observations. However, it is well-established that in the high-dimensional setting (p > N ) the underlying projection estimator degenerates. Moreover, any linear discriminate function involving a large number of features may be difficult to interpret. To ameliorate these issues, two general categories of sparse LDA modifications have been proposed, both to reduce the number of active features and to stabilize the resulting projections. The first, based on optimal scoring, is more straightforward to implement and analyze but has been heavily criticized for its ambiguous connection with the original LDA formulation. In contrast, a second strategy applies sparse penalty functions directly to the original LDA objective but requires additional heuristic trade-off parameters, has unknown global and local minima properties, and requires a greedy sequential optimization procedure. In all cases the choice of sparse regularizer can be important, but no rigorous guidelines have been provided regarding which penalty might be preferable. Against this backdrop, we winnow down the broad space of candidate sparse LDA algorithms and promote a specific selection based on optimal scoring coupled with a particular, complementary sparse regularizer. This overall process ultimately progresses our understanding of sparse LDA in general, while leading to targeted modifications of existing algorithms that produce superior results in practice on three high-dimensional gene data sets.

[1]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[2]  Tsutomu Ohta,et al.  Gene expression analysis of soft tissue sarcomas: characterization and reclassification of malignant fibrous histiocytoma , 2007, Modern Pathology.

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  T. Cai,et al.  A Direct Estimation Approach to Sparse Linear Discriminant Analysis , 2011, 1107.3442.

[5]  Brian Knutson,et al.  Interpretable Classifiers for fMRI Improve Prediction of Purchases , 2008, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Michael K Ng,et al.  On sparse Fisher discriminant method for microarray data analysis , 2007, Bioinformation.

[8]  Mário A. T. Figueiredo Adaptive Sparseness Using Jeffreys Prior , 2001, NIPS.

[9]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[10]  Delin Chu,et al.  Sparse Uncorrelated Linear Discriminant Analysis , 2013, ICML.

[11]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[12]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Murat Dundar,et al.  Sparse Fisher Discriminant Analysis for Computer Aided Detection , 2005, SDM.

[14]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[15]  Chenlei Leng,et al.  Sparse optimal scoring for multiclass cancer diagnosis and biomarker detection using microarray data , 2008, Comput. Biol. Chem..

[16]  A. Edwards Fisher, Ronald A. , 2013 .

[17]  Bhaskar D. Rao,et al.  Latent Variable Bayesian Models for Promoting Sparsity , 2011, IEEE Transactions on Information Theory.

[18]  Gérard Govaert,et al.  An Efficient Approach to Sparse Linear Discriminant Analysis , 2012, ICML.

[19]  Jayant P. Menon,et al.  Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. , 2006, Cancer cell.

[20]  David P. Wipf,et al.  Iterative Reweighted 1 and 2 Methods for Finding Sparse Solutions , 2010, IEEE J. Sel. Top. Signal Process..

[21]  Xihong Lin,et al.  Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection , 2009, Bioinform..

[22]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[23]  Shai Avidan,et al.  Generalized spectral bounds for sparse LDA , 2006, ICML.

[24]  R. Tibshirani,et al.  Penalized classification using Fisher's linear discriminant , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[25]  Bhaskar D. Rao,et al.  Subset selection in noise based on diversity measure minimization , 2003, IEEE Trans. Signal Process..

[26]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .