Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection

Tumor clustering is becoming a powerful method in cancer class discovery. Nonnegative matrix factorization (NMF) has shown advantages over other conventional clustering techniques. Nonetheless, there is still considerable room for improving the performance of NMF. To this end, in this paper, gene selection and explicitly enforcing sparseness are introduced into the factorization process. Particularly, independent component analysis is employed to select a subset of genes so that the effect of irrelevant or noisy genes can be reduced. The NMF and its extensions, sparse NMF and NMF with sparseness constraint, are then used for tumor clustering on the selected genes. A series of elaborate experiments are performed by varying the number of clusters and the number of selected genes to evaluate the cooperation between different gene selection settings and NMF-based clustering. Finally, the experiments on three representative gene expression datasets demonstrated that the proposed scheme can achieve better clustering results.

[1]  Stan Z. Li,et al.  Learning spatially localized, parts-based representation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[3]  Cesare Furlanello,et al.  Entropy-based gene ranking without selection bias for the predictive classification of microarray data , 2003, BMC Bioinformatics.

[4]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[5]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[6]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[7]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[8]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[9]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[10]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[11]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Renfa Li,et al.  Gene Selection with Rough Sets for the Molecular Diagnosing of Tumor Based on Support Vector Machines , 2007 .

[13]  George W. Irwin,et al.  MISEP Method for Postnonlinear Blind Source Separation , 2007, Neural Computation.

[14]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[15]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[18]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Padraig Cunningham,et al.  Application of Simulated Annealing to the Biclustering of Gene Expression Data , 2006, IEEE Transactions on Information Technology in Biomedicine.

[20]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[21]  Hau-San Wong,et al.  Extracting gene regulation information for cancer classification , 2007, Pattern Recognit..

[22]  Qiuming Zhu,et al.  Algorithmic fusion of gene expression profiling for diffuse large B-cell lymphoma outcome prediction , 2004, IEEE Transactions on Information Technology in Biomedicine.

[23]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[24]  David P. Kreil,et al.  Independent component analysis of microarray data in the study of endometrial cancer , 2004, Oncogene.

[25]  Antoine Souloumiac,et al.  Jacobi Angles for Simultaneous Diagonalization , 1996, SIAM J. Matrix Anal. Appl..

[26]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[27]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[28]  Yanqing Zhang,et al.  Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[30]  Michael W. Berry,et al.  Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[31]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[32]  Bruno Torrésani,et al.  Blind Source Separation and the Analysis of Microarray Data , 2004, J. Comput. Biol..

[33]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .

[34]  Li Shang,et al.  Molecular Cancer Class Discovery Using Non-negative Matrix Factorization with Sparseness Constraint , 2007, ICIC.

[35]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[36]  Cinzia Viroli,et al.  Variable Selection in Cell Classification Problems: A Strategy Based on Independent Component Analysis , 2005 .

[37]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..