The application of sparse estimation of covariance matrix to quadratic discriminant analysis

BackgroundAlthough Linear Discriminant Analysis (LDA) is commonly used for classification, it may not be directly applied in genomics studies due to the large p, small n problem in these studies. Different versions of sparse LDA have been proposed to address this significant challenge. One implicit assumption of various LDA-based methods is that the covariance matrices are the same across different classes. However, rewiring of genetic networks (therefore different covariance matrices) across different diseases has been observed in many genomics studies, which suggests that LDA and its variations may be suboptimal for disease classifications. However, it is not clear whether considering differing genetic networks across diseases can improve classification in genomics studies.ResultsWe propose a sparse version of Quadratic Discriminant Analysis (SQDA) to explicitly consider the differences of the genetic networks across diseases. Both simulation and real data analysis are performed to compare the performance of SQDA with six commonly used classification methods.ConclusionsSQDA provides more accurate classification results than other methods for both simulated and real data. Our method should prove useful for classification in genomics studies and other research settings, where covariances differ among classes.

[1]  David A. Landgrebe,et al.  Covariance Matrix Estimation and Classification With Limited Training Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[3]  Liang Chen,et al.  A statistical method for identifying differential gene-gene co-expression patterns , 2004, Bioinform..

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[6]  G. Celeux,et al.  Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition , 1996 .

[7]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[8]  T. Cai,et al.  A Direct Estimation Approach to Sparse Linear Discriminant Analysis , 2011, 1107.3442.

[9]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[10]  Ping Xu,et al.  Modified linear discriminant analysis approaches for classification of high-dimensional microarray data , 2009, Comput. Stat. Data Anal..

[11]  Jing-Yu Yang,et al.  Optimal discriminant plane for a small number of samples and design method of classifier on the plane , 1991, Pattern Recognit..

[12]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[13]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[14]  Tatjana Pavlenko,et al.  Covariance structure approximation via gLasso in high-dimensional supervised classification , 2012 .

[15]  Rainer Breitling,et al.  DiffCoEx: a simple and sensitive method to find differentially coexpressed gene modules , 2010, BMC Bioinformatics.

[16]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[17]  Adam J. Rothman Positive definite estimators of large covariance matrices , 2012 .

[18]  T. Hastie,et al.  Sparse Quadratic Discriminant Analysis and Community Bayes , 2014, 1407.4543.

[19]  Judy H. Cho,et al.  Guilt by rewiring: gene prioritization through network rewiring in genome wide association studies. , 2014, Human molecular genetics.

[20]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[21]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[22]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[23]  Jing-Yu Yang,et al.  Optimal fisher discriminant analysis using the rank decomposition , 1992, Pattern Recognit..

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  Pasquale J. Di Pillo Further applications of bias to discriminant analysis , 1976 .

[26]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.