Molecular Cancer Class Discovery Using Non-negative Matrix Factorization with Sparseness Constraint

In cancer diagnosis and treatment, clustering based on gene expression data has been shown to be a powerful method in cancer class discovery. In this paper, we discuss the use of nonnegative matrix factorization with sparseness constraints (NMFSC), a method which can be used to learn a parts representation of the data, to analysis gene expression data. We illustrate how to choose appropriate sparseness factors in the algorithm and demonstrate the improvement of NMFSC by direct comparison with the nonnegative matrix factorization (NMF). In addition, when using it on the two well-studied datasets, we obtain pretty much the same results with the sparse non-negative matrix factorization (SNMF).

[1]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[2]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[5]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[6]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[7]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[8]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  Stan Z. Li,et al.  Learning spatially localized, parts-based representation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[11]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[12]  Yunde Jia,et al.  FISHER NON-NEGATIVE MATRIX FACTORIZATION FOR LEARNING LOCAL FEATURES , 2004 .

[13]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[14]  Carlo Di Bello,et al.  PCA disjoint models for multiclass cancer analysis using gene expression data , 2003, Bioinform..

[15]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[16]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[17]  E. Oja,et al.  Independent Component Analysis , 2013 .

[18]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[19]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[20]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[21]  Michael W. Berry,et al.  Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[22]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[23]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .