Versatile sparse matrix factorization: Theory and applications

Abstract In the recent years, non-negative matrix factorization and sparse representation models have been successfully applied in high-throughput biological data analysis due to its interpretability and robustness to noise. In this paper, we propose a unified matrix factorization model, coined versatile sparse matrix factorization (VSMF) model, for biological data analysis. We discuss the modelling, optimization, and applications of VSMF. We show that many well-known sparse matrix factorization models are specific cases of our VSMF. Through tuning parameters, sparsity, smoothness, and non-negativity can be easily controlled in VSMF. Our computational experiments for feature extraction, feature selection, and clustering corroborate the advantages of VSMF.

[1]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[3]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[4]  Francisco Tirado,et al.  Modulating the Expression of Disease Genes with RNA-Based Therapy , 2006, BMC Bioinformatics.

[5]  Simon C. K. Shiu,et al.  Molecular Pattern Discovery Based on Penalized Matrix Decomposition , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[7]  Alioune Ngom,et al.  The non-negative matrix factorization toolbox for biological data mining , 2013, Source Code for Biology and Medicine.

[8]  Alioune Ngom,et al.  Sparse representation approaches for the classification of high-dimensional biological data , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[9]  Hyunsoo Kim,et al.  Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method , 2008, SIAM J. Matrix Anal. Appl..

[10]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[13]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[14]  Alioune Ngom,et al.  A new Kernel non-negative matrix factorization and its application in microarray data analysis , 2012, 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[15]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[16]  A. Nobel,et al.  The molecular portraits of breast tumors are conserved across microarray platforms , 2006, BMC Genomics.

[17]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[18]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[19]  A. Godwin,et al.  Detection of treatment-induced changes in signaling pathways in gastrointestinal stromal tumors using transcriptomic data. , 2009, Cancer research.

[20]  Chih-Jen Lin,et al.  On the Convergence of Multiplicative Update Algorithms for Nonnegative Matrix Factorization , 2007, IEEE Transactions on Neural Networks.

[21]  Alioune Ngom,et al.  Non-negative matrix and tensor factorization based classification of clinical microarray gene expression data , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[22]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[24]  Jie Ding,et al.  CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data , 2010, Bioinform..

[25]  Michael F. Ochs,et al.  Matrix factorization for transcriptional regulatory network inference , 2012, 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[26]  Philip M. Kim,et al.  Subsystem identification through dimensionality reduction of large-scale gene expression data. , 2003, Genome research.