Supervised Classification Using Sparse Fisher's LDA

It is well known that in a supervised classification setting when the number of features is smaller than the number of observations, Fisher's linear discriminant rule is asymptotically Bayes. However, there are numerous modern applications where classification is needed in the high-dimensional setting. Naive implementation of Fisher's rule in this case fails to provide good results because the sample covariance matrix is singular. Moreover, by constructing a classifier that relies on all features the interpretation of the results is challenging. Our goal is to provide robust classification that relies only on a small subset of important features and accounts for the underlying correlation structure. We apply a lasso-type penalty to the discriminant vector to ensure sparsity of the solution and use a shrinkage type estimator for the covariance matrix. The resulting optimization problem is solved using an iterative coordinate ascent algorithm. Furthermore, we analyze the effect of nonconvexity on the sparsity level of the solution and highlight the difference between the penalized and the constrained versions of the problem. The simulation results show that the proposed method performs favorably in comparison to alternatives. The method is used to classify leukemia patients based on DNA methylation features.

[1]  Adam J. Rothman Positive definite estimators of large covariance matrices , 2012 .

[2]  J. Booth,et al.  Integrative Model-based clustering of microarray methylation and expression data , 2012, 1210.0702.

[3]  H. Zou,et al.  A direct approach to sparse discriminant analysis in ultra-high dimensions , 2012 .

[4]  Yang Feng,et al.  A road to classification in high dimensional space: the regularized optimal affine discriminant , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[5]  R. Tibshirani,et al.  Penalized classification using Fisher's linear discriminant , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[6]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[7]  T. Cai,et al.  A Direct Estimation Approach to Sparse Linear Discriminant Analysis , 2011, 1107.3442.

[8]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[9]  Guanghua Xiao,et al.  Modeling Three-Dimensional Chromosome Structures Using Gene Expression Data , 2011, Journal of the American Statistical Association.

[10]  Tiejun Tong,et al.  Bias‐Corrected Diagonal Discriminant Rules for High‐Dimensional Classification , 2010, Biometrics.

[11]  Fabien Campagne,et al.  DNA methylation signatures identify biologically distinct subtypes in acute myeloid leukemia. , 2010, Cancer cell.

[12]  A. U.S.,et al.  Sparse Estimation of a Covariance Matrix , 2010 .

[13]  Tiejun Tong,et al.  Shrinkage‐based Diagonal Discriminant Analysis and Its Applications in High‐Dimensional Data , 2009, Biometrics.

[14]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[15]  Quan Chen,et al.  An analytical pipeline for genomic representations used for cytosine methylation studies , 2008, Bioinform..

[16]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[17]  Kathrin Klamroth,et al.  Biconvex sets and optimization with biconvex functions: a survey and extensions , 2007, Math. Methods Oper. Res..

[18]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[19]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[20]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[21]  Tatsuya Kubokawa,et al.  Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data , 2008 .

[22]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[23]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[24]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[25]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[26]  Olivier Ledoit,et al.  Improved estimation of the covariance matrix of stock returns with an application to portfolio selection , 2003 .

[27]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[28]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[29]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[30]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[31]  Z. D. Bai,et al.  Limit of the smallest eigenvalue of a large dimensional sample covariance matrix , 1993 .

[32]  J. Friedman Regularized Discriminant Analysis , 1989 .

[33]  Decision Systems.,et al.  Coordinate ascent for maximizing nondifferentiable concave functions , 1988 .

[34]  C. O'Connor An introduction to multivariate statistical analysis: 2nd edn. by T. W. Anderson. 675 pp. Wiley, New York (1984) , 1987 .

[35]  Clifford S. Stein Estimation of a covariance matrix , 1975 .

[36]  W. G. Cochran On the Performance of the Linear Discriminant Function , 1964 .

[37]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[38]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .