Rank Selection in Non-negative Matrix Factorization: systematic comparison and a new MAD metric

Non-Negative Matrix Factorization (NMF) is a powerful dimensionality reduction and factorization method that provides a part-based representation of the data. In the absence of a priori knowledge about the latent dimensionality of the data, it is necessary to select a rank of the reduced representation. Several rank selection methods have been proposed, but no consensus exists on when a method is suitable to use. In this work, we propose a new metric for rank selection based on imputation cross-validation, and we systematically compare it against six other metrics while assessing the effects of data properties. Using synthetic datasets with different properties, our work critically evidences that most methods fail to identify the true rank. We show that properties of the data heavily impact the ability of different methods. Imputation-based metrics, including our new MADimput, provided the best accuracy irrespective of the data type, but no solution worked perfectly in all circumstances. One should therefore carefully assess characteristics of their dataset in order to identify the most suitable metric for rank selection.

[1]  Victor Solo,et al.  Tuning parameter selection for nonnegative matrix factorization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Michael W. Berry,et al.  Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[3]  R. Gur,et al.  Patterns of coordinated cortical remodeling during adolescence and their associations with functional specialization and evolutionary expansion , 2017, Proceedings of the National Academy of Sciences.

[4]  Christos Boutsidis,et al.  SVD based initialization: A head start for nonnegative matrix factorization , 2008, Pattern Recognit..

[5]  Vikas Sindhwani,et al.  Rank Selection in Low-rank Matrix Approximations : A Study of Cross-Validation for NMFs , 2010 .

[6]  Mark S. Goldman,et al.  Unsupervised discovery of temporal sequences in high-dimensional datasets, with applications to neuroscience , 2018, bioRxiv.

[7]  Shiqiang Wang,et al.  Non-negative matrix factorization of signals with overlapping events for event detection applications , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Adam Prügel-Bennett,et al.  Rank Selection in Nonnegative Matrix Factorization using Minimum Description Length , 2017, Neural Computation.

[9]  R Bro,et al.  Cross-validation of component models: A critical look at current methods , 2008, Analytical and bioanalytical chemistry.

[10]  Mattias Höglund,et al.  Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes , 2008, Cancer informatics.

[11]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[12]  Patrick O. Perry,et al.  Bi-cross-validation of the SVD and the nonnegative matrix factorization , 2009, 0908.2062.

[13]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[14]  Renaud Gaujoux,et al.  A flexible R package for nonnegative matrix factorization , 2010, BMC Bioinformatics.

[15]  Karthik Devarajan,et al.  Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[16]  Gaël Richard,et al.  A structured nonnegative matrix factorization for source separation , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[17]  Christos Davatzikos,et al.  Finding imaging patterns of structural covariance via Non-Negative Matrix Factorization , 2015, NeuroImage.

[18]  Michael W. Berry,et al.  Text Mining Using Non-Negative Matrix Factorizations , 2004, SDM.

[19]  Vincent Y. F. Tan,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization with the /spl beta/-Divergence , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[21]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Jordi Vitrià,et al.  Non-negative Matrix Factorization for Face Recognition , 2002, CCIA.

[23]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[24]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[25]  Geoffrey J. Gordon,et al.  A Unified View of Matrix Factorization Models , 2008, ECML/PKDD.

[26]  Yu-Jin Zhang,et al.  Nonnegative Matrix Factorization: A Comprehensive Review , 2013, IEEE Transactions on Knowledge and Data Engineering.

[27]  Christos Davatzikos,et al.  Evaluation of non-negative matrix factorization of grey matter in age prediction , 2018, NeuroImage.

[28]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[29]  Saharon Rosset,et al.  Excess Optimism: How Biased is the Apparent Error of an Estimator Tuned by SURE? , 2016, Journal of the American Statistical Association.

[30]  E. Lin NNLM: A package For Fast And Versatile Nonnegative Matrix Factorization , 2019 .

[31]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.