A review of discriminant analysis in high dimensions

Linear discriminant analysis (LDA) is among the most classical classification techniques, while it continues to be a popular and important classifier in practice. However, the advancement of science and technology brings the new challenge of high-dimensional datasets, where the dimension can be in thousands. In such datasets, LDA is inapplicable. Recently, statisticians have devoted many efforts to creating high-dimensional LDA methods. These methods typically perform variable selection via regularization techniques. Various theoretical results, algorithms, and empirical results support the application of these methods. In this review, we provide a brief description of difficulties in extending LDA and present some successful proposals. WIREs Comput Stat 2013, 5:190–197. doi: 10.1002/wics.1257 Conflict of interest: The author has declared no conflicts of interest for this article.

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[3]  Jinchi Lv,et al.  A unified approach to model selection and sparse recovery using regularized least squares , 2009, 0905.3573.

[4]  Jieping Ye,et al.  Using uncorrelated discriminant analysis for tissue classification with gene expression data , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  J. Friedman Regularized Discriminant Analysis , 1989 .

[6]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[7]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[8]  Xihong Lin,et al.  Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection , 2009, Bioinform..

[9]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[10]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[11]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[12]  Harrison H. Zhou,et al.  Optimal rates of convergence for covariance matrix estimation , 2010, 1010.3866.

[13]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[14]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[15]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[16]  J. S. Marron,et al.  Geometric representation of high dimension, low sample size data , 2005 .

[17]  Jieping Ye,et al.  Generalized Linear Discriminant Analysis: A Unified Framework and Efficient Model Selection , 2008, IEEE Transactions on Neural Networks.

[18]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[19]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[20]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[21]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[22]  T. Cai,et al.  A Direct Estimation Approach to Sparse Linear Discriminant Analysis , 2011, 1107.3442.

[23]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[24]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[25]  R. Tibshirani,et al.  Penalized classification using Fisher's linear discriminant , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[26]  H. Zou,et al.  A direct approach to sparse discriminant analysis in ultra-high dimensions , 2012 .

[27]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[28]  Konstantinos N. Plataniotis,et al.  An efficient kernel discriminant analysis method , 2005, Pattern Recognit..

[29]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[30]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[31]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[32]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[33]  Ian T. Jolliffe,et al.  DALASS: Variable selection in discriminant analysis via the LASSO , 2007, Comput. Stat. Data Anal..

[34]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[35]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[36]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[37]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[38]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.