Sparse p-norm Nonnegative Matrix Factorization for clustering gene expression data

Nonnegative Matrix Factorization (NMF) is a powerful tool for gene expression data analysis as it reduces thousands of genes to a few compact metagenes, especially in clustering gene expression samples for cancer class discovery. Enhancing sparseness of the factorisation can find only a few dominantly coexpressed metagenes and improve the clustering effectiveness. Sparse p-norm (p > 1) Nonnegative Matrix Factorization (Sp-NMF) is a more sparse representation method using high order norm to normalise the decomposed components. In this paper, we investigate the benefit of high order normalisation for clustering cancer-related gene expression samples. Experimental results demonstrate that Sp-NMF leads to robust and effective clustering in both automatically determining the cluster number, and achieving high accuracy.

[1]  D. Donoho Sparse Components of Images and Optimal Atomic Decompositions , 2001 .

[2]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Nanning Zheng,et al.  Learning sparse features for classification by mixture models , 2004, Pattern Recognit. Lett..

[4]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[5]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[7]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[10]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11]  Nanning Zheng,et al.  Non-negative matrix factorization for visual coding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Chengyu Liu,et al.  Biclustering of gene expression data by non-smooth non-negative matrix factorization , 2010 .

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[16]  Daniel D. Lee,et al.  Multiplicative Updates for Classification by Mixture Models , 2001, NIPS.

[17]  Pierre Baldi,et al.  DNA Microarrays and Gene Expression - From Experiments to Data Analysis and Modeling , 2002 .

[18]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[19]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[20]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[21]  Stan Z. Li,et al.  Learning spatially localized, parts-based representation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[22]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[23]  Guoli Wang,et al.  LS-NMF: A modified non-negative matrix factorization algorithm utilizing uncertainty estimates , 2006, BMC Bioinformatics.

[24]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..