Hybrid PCA and LDA Analysis of Microarray Gene Expression Data

Microarray technology offers a high throughput means to study expression networks and gene regulatory networks in cells. The intrinsic nature of high dimensionality and small sample size in microarray data calls for the development of effective computational methods. In this paper, we propose a novel hybrid dimension reduction technique for classification - hybrid PCA (principal component analysis) and LDA (linear discriminant analysis) analysis. This technique effectively solves the singular scatter matrix problem caused by small training samples and increases the effective dimension of the projected subspace. It offers more flexibility and a richer set of alternatives to LDA and PCA in the parametric space. In addition, generalization of hybrid analysis of other dimension reduction techniques is also proposed in this paper, such as multiple discriminant analysis (MDA) and biased discriminant analysis (BDA). Extensive experiments on the yeast cell cycle regulation data set show the superior performance of the hybrid analysis over the traditional methods such as SVM.

[1]  Eric R. Ziegel,et al.  Statistical Methods in Bioinformatics , 2002, Technometrics.

[2]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[5]  Qi Tian,et al.  Parameterized discriminant analysis for image classification , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[6]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[7]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[8]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[9]  Yi Ma,et al.  Minimum effective dimension for mixtures of subspaces: a robust GPCA algorithm and its applications , 2004, CVPR 2004.

[10]  Sun-Yuan Kung,et al.  Principal Component Neural Networks: Theory and Applications , 1996 .

[11]  K. Etemad,et al.  Discriminant analysis for recognition of human face images , 1997 .

[12]  Juha Karhunen,et al.  Principal component neural networks — Theory and applications , 1998, Pattern Analysis and Applications.

[13]  R. Fisher THE STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS , 1938 .

[14]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[15]  Juyang Weng,et al.  Hierarchical Discriminant Analysis for Image Retrieval , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  M. Gerstein,et al.  Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. , 2002, Genome research.

[17]  Gregory R. Grant,et al.  Statistical Methods in Bioinformatics , 2001 .

[18]  Bruce A. Draper,et al.  A nonparametric statistical comparison of principal component and linear discriminant subspaces for face recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19]  Hong Z. Tan,et al.  Template-based Recognition of Static Sitting Postures , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[20]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[21]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  See-Kiong Ng,et al.  On combining multiple microarray studies for improved functional classification by whole-dataset feature selection. , 2003, Genome informatics. International Conference on Genome Informatics.

[23]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[24]  Thomas S. Huang,et al.  Small sample learning during multimedia retrieval using BiasMap , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.