Semi-Supervised Projective Non-Negative Matrix Factorization for Cancer Classification

Advances in DNA microarray technologies have made gene expression profiles a significant candidate in identifying different types of cancers. Traditional learning-based cancer identification methods utilize labeled samples to train a classifier, but they are inconvenient for practical application because labels are quite expensive in the clinical cancer research community. This paper proposes a semi-supervised projective non-negative matrix factorization method (Semi-PNMF) to learn an effective classifier from both labeled and unlabeled samples, thus boosting subsequent cancer classification performance. In particular, Semi-PNMF jointly learns a non-negative subspace from concatenated labeled and unlabeled samples and indicates classes by the positions of the maximum entries of their coefficients. Because Semi-PNMF incorporates statistical information from the large volume of unlabeled samples in the learned subspace, it can learn more representative subspaces and boost classification performance. We developed a multiplicative update rule (MUR) to optimize Semi-PNMF and proved its convergence. The experimental results of cancer classification for two multiclass cancer gene expression profile datasets show that Semi-PNMF outperforms the representative methods.

[1]  Long Lan,et al.  Box-constrained projective nonnegative matrix factorization via augmented Lagrangian method , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[2]  Bani K. Mallick,et al.  Gene selection using a two-level hierarchical Bayesian model , 2004, Bioinform..

[3]  Long Lan,et al.  Graph Based Semi-supervised Non-negative Matrix Factorization for Document Clustering , 2012, 2012 11th International Conference on Machine Learning and Applications.

[4]  Zhigang Luo,et al.  NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization , 2012, IEEE Transactions on Signal Processing.

[5]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[6]  Georgios C. Anagnostopoulos,et al.  Multiclass Cancer Classification Using Semisupervised Ellipsoid ARTMAP and Particle Swarm Optimization with Gene Expression Data , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[8]  J. Mesirov,et al.  Chemosensitivity prediction by transcriptional profiling , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Michael W. Berry,et al.  Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[10]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[12]  Douglas M. Hawkins,et al.  Inferential, robust non-negative matrix factorization analysis of microarray data , 2007, Bioinform..

[13]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[14]  Carlo Di Bello,et al.  PCA disjoint models for multiclass cancer analysis using gene expression data , 2003, Bioinform..

[15]  Erkki Oja,et al.  Projective Nonnegative Matrix Factorization for Image Compression and Feature Extraction , 2005, SCIA.

[16]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[17]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[18]  Björn Nilsson,et al.  A Framework for Regularized Non-Negative Matrix Factorization, with Application to the Analysis of Gene Expression Data , 2012, PloS one.

[19]  Desmond J. Higham,et al.  Simultaneous Non-Negative Matrix Factorization for Multiple Large Scale Gene Expression Datasets in Toxicology , 2012, PloS one.

[20]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[21]  Behrouz Madahian,et al.  Application of Sparse Bayesian Generalized Linear Model to Gene Expression Data for Classification of Prostate Cancer Subtypes , 2014 .

[22]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[23]  Erkki Oja,et al.  Linear and Nonlinear Projective Nonnegative Matrix Factorization , 2010, IEEE Transactions on Neural Networks.

[24]  Weida Tong,et al.  Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data , 2005, Nucleic acids research.

[25]  Lawrence K. Saul,et al.  Nonnegative Matrix Factorization for Semi-supervised Dimensionality Reduction , 2011, ArXiv.

[26]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[28]  Ujjwal Maulik,et al.  Gene-Expression-Based Cancer Subtypes Prediction Through Feature Selection and Transductive SVM , 2013, IEEE Transactions on Biomedical Engineering.

[29]  Stan Z. Li,et al.  Learning spatially localized, parts-based representation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[30]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[31]  Xuelong Li,et al.  Constrained Nonnegative Matrix Factorization for Image Representation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[33]  D. Wunsch,et al.  Multiclass Cancer Classification Using Semisupervised Ellipsoid ARTMAP and Particle Swarm Optimization with Gene Expression Data , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Bing Zhang,et al.  Semi-supervised learning improves gene expression-based prediction of cancer recurrence , 2011, Bioinform..

[36]  Juan Liu,et al.  A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules , 2011, Bioinform..

[37]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[38]  Charles Wang,et al.  Multi-class tumor classification by discriminant partial least squares using microarray gene expression data and assessment of classification models , 2004, Comput. Biol. Chem..

[39]  Yuan Gao,et al.  Improving molecular cancer class discovery through sparse non-negative matrix factorization , 2005 .

[40]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[41]  Guifang Fu,et al.  The Bayesian lasso for genome-wide association studies , 2011, Bioinform..

[42]  Danh V. Nguyen,et al.  Multi-class cancer classification via partial least squares with gene expression profiles , 2002, Bioinform..

[43]  Erkki Oja,et al.  Adaptive Multiplicative Updates for Projective Nonnegative Matrix Factorization , 2012, ICONIP.

[44]  Long Lan,et al.  Soft-constrained nonnegative matrix factorization via normalization , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).