Combining LPP with PCA for microarray data clustering

DNA microarray technique has produced large amount of gene expression data. To analyze these data, many excellent machine learning techniques have been proposed in recent related work. In this paper, we try to perform the clustering of microarray data by combining the recently proposed locality preserving projection (LPP) method with PCA, i.e. PCA-LPP. The comparison between PCA and PCA-LPP is performed based on two clustering algorithms, K-means and agglomerative hierarchical clustering. As we already known, clustering with the components extracted by PCA instead of the original variables does improve cluster quality. Moreover, our empirical study shows that by using LPP to perform further process the dimensions of components extracted by PCA can be further reduced and the quality of the clusters can be improved greatly meanwhile. Particularly, the first few components obtained by PCA-LPP capture more information of the cluster structure than those of PCA.

[1]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[2]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[3]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[5]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[7]  Peter Willett,et al.  Comparison of Hierarchie Agglomerative Clustering Methods for Document Retrieval , 1989, Comput. J..

[8]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[10]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[12]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[13]  Clark F. Olson,et al.  Parallel Algorithms for Hierarchical Clustering , 1995, Parallel Comput..

[14]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[15]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[17]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[18]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[19]  Xin Jin,et al.  Kernel Independent Component Analysis for Gene Expression Data Clustering , 2006, ICA.