De-biasing Covariance-Regularized Discriminant Analysis

Fisher’s Linear Discriminant Analysis (FLD) is a well-known technique for linear classification, feature extraction and dimension reduction. The empirical FLD relies on two key estimations from the data – the mean vector for each class and the (inverse) covariance matrix. To improve the accuracy of FLD under the High Dimension Low Sample Size (HDLSS) settings, Covariance-Regularized FLD (CRLD) has been proposed to use shrunken covariance estimators, such as Graphical Lasso, to strike a balance between biases and variances. Though CRLD could obtain better classification accuracy, it usually incurs bias and converges to the optimal result with a slower asymptotic rate. Inspired by the recent progress in de-biased Lasso, we propose a novel FLD classifier, DBLD, which improves classification accuracy of CRLD through de-biasing. Theoretical analysis shows that DBLD possesses better asymptotic properties than CRLD. We conduct experiments on both synthetic datasets and real application datasets to confirm the correctness of our theoretical analysis and demonstrate the superiority of DBLD over classical FLD, CRLD and other downstream competitors under HDLSS settings.

[1]  Harrison H. Zhou,et al.  Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation , 2016 .

[2]  Zhihua Zhang,et al.  Learning Metrics via Discriminant Kernels and Multidimensional Scaling: Toward Expected Euclidean Representation , 2003, ICML.

[3]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[4]  Marcos Dipinto,et al.  Discriminant analysis , 2020, Predictive Analytics.

[5]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[6]  John Van Ness,et al.  The Use of Shrinkage Estimators in Linear Discriminant Analysis , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ata Kabán,et al.  Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions , 2015, Machine Learning.

[8]  J. Cavanaugh Biostatistics , 2005, Definitions.

[9]  Akademii︠a︡ nauk Sssr,et al.  Mathematics of the USSR - Sbornik , 1967 .

[10]  R. Fildes Journal of the American Statistical Association : William S. Cleveland, Marylyn E. McGill and Robert McGill, The shape parameter for a two variable graph 83 (1988) 289-300 , 1989 .

[11]  J. Friedman,et al.  New Insights and Faster Computations for the Graphical Lasso , 2011 .

[12]  Edward R. Dougherty,et al.  Random matrix theory in pattern classification: An application to error estimation , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.

[13]  Jieping Ye,et al.  Two-Dimensional Linear Discriminant Analysis , 2004, NIPS.

[14]  Adam J Rothman,et al.  Sparse Multivariate Regression With Covariance Estimation , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[15]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[17]  R. Maitra,et al.  Supplement to “ A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere ” published in the Journal of Computational and Graphical Statistics , 2009 .

[18]  S. Geer,et al.  Confidence intervals for high-dimensional inverse covariance estimation , 2014, 1403.6752.

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[21]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[22]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[23]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[24]  A. Dasgupta Asymptotic Theory of Statistics and Probability , 2008 .