The non-negative matrix factorization toolbox for biological data mining

BackgroundNon-negative matrix factorization (NMF) has been introduced as an important method for mining biological data. Though there currently exists packages implemented in R and other programming languages, they either provide only a few optimization algorithms or focus on a specific application field. There does not exist a complete NMF package for the bioinformatics community, and in order to perform various data mining tasks on biological data.ResultsWe provide a convenient MATLAB toolbox containing both the implementations of various NMF techniques and a variety of NMF-based data mining approaches for analyzing biological data. Data mining approaches implemented within the toolbox include data clustering and bi-clustering, feature extraction and selection, sample classification, missing values imputation, data visualization, and statistical comparison.ConclusionsA series of analysis such as molecular pattern discovery, biological process identification, dimension reduction, disease prediction, visualization, and statistical comparison can be performed using this toolbox.

[1]  Alioune Ngom,et al.  Classification approach based on non-negative least squares , 2013, Neurocomputing.

[2]  Renaud Gaujoux,et al.  A flexible R package for nonnegative matrix factorization , 2010, BMC Bioinformatics.

[3]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  A. Kudlicki,et al.  Logic of the Yeast Metabolic Cycle: Temporal Compartmentalization of Cellular Processes , 2005, Science.

[5]  Chengyu Liu,et al.  Biclustering of gene expression data by non-smooth non-negative matrix factorization , 2010 .

[6]  A. Godwin,et al.  Detection of treatment-induced changes in signaling pathways in gastrointestinal stromal tumors using transcriptomic data. , 2009, Cancer research.

[7]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[8]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[9]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Philip M. Kim,et al.  Subsystem identification through dimensionality reduction of large-scale gene expression data. , 2003, Genome research.

[12]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[13]  Yingdong Zhao,et al.  Non-negative matrix factorization of gene expression profiles: a plug-in for BRB-ArrayTools , 2009, Bioinform..

[14]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[15]  Guoli Wang,et al.  LS-NMF: A modified non-negative matrix factorization algorithm utilizing uncertainty estimates , 2006, BMC Bioinformatics.

[16]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[18]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[19]  Daoqiang Zhang,et al.  Non-negative Matrix Factorization on Kernels , 2006, PRICAI.

[20]  Jie Ding,et al.  CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data , 2010, Bioinform..

[21]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[22]  Michael F. Ochs,et al.  Matrix factorization for transcriptional regulatory network inference , 2012, 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[23]  Sarah A. Pendergrass,et al.  A Core MYC Gene Expression Signature Is Prominent in Basal-Like Breast Cancer but Only Partially Overlaps the Core Serum Response , 2009, PloS one.

[24]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  KlingenbergBradley,et al.  Non-negative matrix factorization , 2009 .

[26]  Purvesh Khatri,et al.  Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate , 2003, Nucleic Acids Res..

[27]  Sayan Mukherjee,et al.  Estimating Dataset Size Requirements for Classifying DNA Microarray Data , 2003, J. Comput. Biol..

[28]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[29]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[30]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[31]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[32]  Alioune Ngom,et al.  A new Kernel non-negative matrix factorization and its application in microarray data analysis , 2012, 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[33]  A. Nobel,et al.  The molecular portraits of breast tumors are conserved across microarray platforms , 2006, BMC Genomics.

[34]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[35]  Hyunsoo Kim,et al.  Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method , 2008, SIAM J. Matrix Anal. Appl..