$\mathcal{DBSDA}$ : Lowering the Bound of Misclassification Rate for Sparse Linear Discriminant Analysis via Model Debiasing

Linear discriminant analysis (LDA) is a well-known technique for linear classification, feature extraction, and dimension reduction. To improve the accuracy of LDA under the high dimension low sample size (HDLSS) settings, shrunken estimators, such as Graphical Lasso, can be used to strike a balance between biases and variances. Although the estimator with induced sparsity obtains a faster convergence rate, however, the introduced bias may also degrade the performance. In this paper, we theoretically analyze how the sparsity and the convergence rate of the precision matrix (also known as inverse covariance matrix) estimator would affect the classification accuracy by proposing an analytic model on the upper bound of an LDA misclassification rate. Guided by the model, we propose a novel classifier, <inline-formula> <tex-math notation="LaTeX">$\mathcal {DBSDA}$ </tex-math></inline-formula>, which improves classification accuracy through <italic>debiasing</italic>. Theoretical analysis shows that <inline-formula> <tex-math notation="LaTeX">$\mathcal {DBSDA}$ </tex-math></inline-formula> possesses a reduced upper bound of misclassification rate and better asymptotic properties than sparse LDA (SDA). We conduct experiments on both synthetic datasets and real application datasets to confirm the correctness of our theoretical analysis and demonstrate the superiority of <inline-formula> <tex-math notation="LaTeX">$\mathcal {DBSDA}$ </tex-math></inline-formula> over LDA, SDA, and other downstream competitors under HDLSS settings.

[1]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[2]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[3]  Delin Chu,et al.  Sparse Uncorrelated Linear Discriminant Analysis for Undersampled Problems , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Zhihua Zhang,et al.  Regularized Discriminant Analysis, Ridge Regression and Beyond , 2010, J. Mach. Learn. Res..

[5]  T. Cai,et al.  A Direct Estimation Approach to Sparse Linear Discriminant Analysis , 2011, 1107.3442.

[6]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[7]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[8]  Jian Yang,et al.  Sparse tensor discriminant analysis , 2013, IEEE Transactions on Image Processing.

[9]  Samuel Kaski,et al.  Informative Discriminant Analysis , 2003, ICML.

[10]  J. Friedman,et al.  New Insights and Faster Computations for the Graphical Lasso , 2011 .

[11]  Ran He,et al.  Robust Discriminant Analysis Based on Nonparametric Maximum Entropy , 2009, ACML.

[12]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[13]  Alexandros Iosifidis,et al.  On the Optimal Class Representation in Linear Discriminant Analysis , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[15]  Ji Zhu,et al.  Two-Stage Regularized Linear Discriminant Analysis for 2-D Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[16]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[17]  Jieping Ye,et al.  Two-Dimensional Linear Discriminant Analysis , 2004, NIPS.

[18]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[19]  James C. Turner,et al.  College Health Surveillance Network: Epidemiology and Health Care Utilization of College Students at US 4-Year Universities , 2015, Journal of American college health : J of ACH.

[20]  Edward R. Dougherty,et al.  Random matrix theory in pattern classification: An application to error estimation , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.

[21]  Yu Huang,et al.  M-SEQ: Early detection of anxiety and depression via temporal orders of diagnoses in electronic health data , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[22]  Iickho Song,et al.  Complexity-Reduced Scheme for Feature Extraction With Linear Discriminant Analysis , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Feiping Nie,et al.  Effective Discriminative Feature Selection With Nontrivial Solution , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[25]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[26]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[27]  Zhihua Qiao,et al.  Efiective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data , 2007 .

[28]  Chris Field,et al.  Small Sample Asymptotic Expansions for Multivariate $M$-Estimates , 1982 .

[29]  S. Raudys,et al.  Results in statistical discriminant analysis: a review of the former Soviet union literature , 2004 .

[30]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[31]  W. V. McCarthy,et al.  Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data , 1995 .

[32]  Harrison H. Zhou,et al.  Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation , 2016 .

[33]  Edward R. Dougherty,et al.  Analytic Study of Performance of Error Estimators for Linear Discriminant Analysis , 2011, IEEE Transactions on Signal Processing.

[34]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[35]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[36]  Yuan Yan Tang,et al.  Generalization Performance of Fisher Linear Discriminant Based on Markov Sampling , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[37]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[38]  A. Robert Calderbank,et al.  Communications Inspired Linear Discriminant Analysis , 2012, ICML.

[39]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[40]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[41]  S. Geer,et al.  Confidence intervals for high-dimensional inverse covariance estimation , 2014, 1403.6752.

[42]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[43]  Zhihua Zhang,et al.  Learning Metrics via Discriminant Kernels and Multidimensional Scaling: Toward Expected Euclidean Representation , 2003, ICML.

[44]  H. Zou,et al.  A direct approach to sparse discriminant analysis in ultra-high dimensions , 2012 .

[45]  John Van Ness,et al.  The Use of Shrinkage Estimators in Linear Discriminant Analysis , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Motoaki Kawanabe,et al.  In Search of Non-Gaussian Components of a High-Dimensional Distribution , 2006, J. Mach. Learn. Res..

[47]  Jieping Ye,et al.  An optimization criterion for generalized discriminant analysis on undersampled problems , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Quanquan Gu,et al.  Aggregating Private Sparse Learning Models Using Multi-Party Computation , 2016 .