Modified linear discriminant analysis approaches for classification of high-dimensional microarray data

Linear discriminant analysis (LDA) is one of the most popular methods of classification. For high-dimensional microarray data classification, due to the small number of samples and large number of features, classical LDA has sub-optimal performance corresponding to the singularity and instability of the within-group covariance matrix. Two modified LDA approaches (MLDA and NLDA) were applied for microarray classification and their performance criteria were compared with other popular classification algorithms across a range of feature set sizes (number of genes) using both simulated and real datasets. The results showed that the overall performance of the two modified LDA approaches was as competitive as support vector machines and other regularized LDA approaches and better than diagonal linear discriminant analysis, k-nearest neighbor, and classical LDA. It was concluded that the modified LDA approaches can be used as an effective classification tool in limited sample size and high-dimensional microarray classification problems.

[1]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[2]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[3]  Jurjen Duintjer Tebbens,et al.  Improving implementation of linear discriminant analysis for the high dimension/small sample size problem , 2007, Comput. Stat. Data Anal..

[4]  Pasquale J. Di Pillo Further applications of bias to discriminant analysis , 1976 .

[5]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[6]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[7]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[8]  Qiuming Zhu,et al.  Algorithmic fusion of gene expression profiling for diffuse large B-cell lymphoma outcome prediction , 2004, IEEE Transactions on Information Technology in Biomedicine.

[9]  R. Tibshirani,et al.  Efficient quadratic regularization for expression arrays. , 2004, Biostatistics.

[10]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[11]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[12]  Jieping Ye,et al.  Efficient model selection for regularized linear discriminant analysis , 2006, CIKM '06.

[13]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[14]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[15]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.

[16]  Alicia L. Carriquiry,et al.  STATISTICAL ANALYSIS OF GENE EXPRESSION MICROARRAYS , 2005 .

[17]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[18]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[19]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[20]  R. Gentleman,et al.  Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. , 2004, Blood.

[21]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[22]  Mayer Aladjem,et al.  Regularized discriminant analysis for face recognition , 2004, Pattern Recognit..

[23]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[24]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Richard Simon,et al.  Development and Validation of Biomarker Classifiers for Treatment Selection. , 2008, Journal of statistical planning and inference.

[26]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[27]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[28]  Duncan Fyfe Gillies,et al.  A Maximum Uncertainty LDA-Based Approach for Limited Sample Size Problems : With Application to Face Recognition , 2005, SIBGRAPI.

[29]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[30]  Ian B. Jeffery,et al.  Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data , 2006, BMC Bioinformatics.

[31]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[32]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[34]  Ping Xu,et al.  Computational Statistics and Data Analysis Distribution Modeling and Simulation of Gene Expression Data , 2022 .

[35]  S.-J. Wang,et al.  Utility of high dimensional genomic composite biomarkers in therapeutic and/or diagnostic development , 2005, Conference, Emerging Information Technology 2005..

[36]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..