论文信息 - Rank Selection in Nonnegative Matrix Factorization using Minimum Description Length

Rank Selection in Nonnegative Matrix Factorization using Minimum Description Length

Nonnegative matrix factorization (NMF) is primarily a linear dimensionality reduction technique that factorizes a nonnegative data matrix into two smaller nonnegative matrices: one that represents the basis of the new subspace and the second that holds the coefficients of all the data points in that new space. In principle, the nonnegativity constraint forces the representation to be sparse and parts based. Instead of extracting holistic features from the data, real parts are extracted that should be significantly easier to interpret and analyze. The size of the new subspace selects how many features will be extracted from the data. An effective choice should minimize the noise while extracting the key features. We propose a mechanism for selecting the subspace size by using a minimum description length technique. We demonstrate that our technique provides plausible estimates for real data as well as accurately predicting the known size of synthetic data. We provide an implementation of our code in a Matlab format.

Adam Prügel-Bennett | Mahesan Niranjan | Steven Squires

[1] Patrick O. Perry,et al. Bi-cross-validation of the SVD and the nonnegative matrix factorization , 2009, 0908.2062.

[2] Victor Solo,et al. Tuning parameter selection for nonnegative matrix factorization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] C. S. Wallace,et al. An Information Measure for Classification , 1968, Comput. J..

[4] Vikas Sindhwani,et al. Rank Selection in Low-rank Matrix Approximations : A Study of Cross-Validation for NMFs , 2010 .

[5] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[6] Nicolas Gillis,et al. The Why and How of Nonnegative Matrix Factorization , 2014, ArXiv.

[7] Karthik Devarajan,et al. Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[8] Perry R. Cook,et al. Bayesian Nonparametric Matrix Factorization for Recorded Music , 2010, ICML.

[9] Patrik O. Hoyer,et al. Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[10] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.