A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing

Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix $$V$$V into the product of two nonnegative matrices, $$W$$W and $$H$$H, such that $$V \sim WH$$V∼WH. It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi’s divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for $$W$$W and $$H$$H. In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data.

[1]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[2]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[3]  Seungjin Choi,et al.  Non-negative component parts of sound for classification , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[4]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[5]  Baowen Xu,et al.  Matrix dimensionality reduction for mining Web logs , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[6]  V. Cheung,et al.  Non-negative matrix factorization algorithms modeling noise distributions within the exponential family , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[7]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[8]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[10]  Karthik Devarajan,et al.  Matrix and Tensor Decompositions , 2010 .

[11]  Andrzej Cichocki,et al.  Fast Nonnegative Matrix/Tensor Factorization Based on Low-Rank Approximation , 2012, IEEE Transactions on Signal Processing.

[12]  Patrik O. Hoyer,et al.  Modeling Receptive Fields with Non-Negative Sparse Coding , 2002, Neurocomputing.

[13]  Leandro Pardo,et al.  Size and power considerations for testing loglinear models using divergence test statistics , 2003 .

[14]  Nanning Zheng,et al.  Non-negative matrix factorization for visual coding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  Karthik Devarajan,et al.  Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[16]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[17]  Bart Kosko,et al.  Neural networks for signal processing , 1992 .

[18]  Nicolas Gillis,et al.  Accelerated Multiplicative Updates and Hierarchical ALS Algorithms for Nonnegative Matrix Factorization , 2011, Neural Computation.

[19]  Lawrence K. Saul,et al.  Modeling distances in large-scale networks by matrix factorization , 2004, IMC '04.

[20]  Naren Ramakrishnan,et al.  Problem Solving Handbook in Computational Biology and Bioinformatics , 2010 .

[21]  Éric Gaussier,et al.  Relation between PLSA and NMF and implications , 2005, SIGIR '05.

[22]  Yuan Qi,et al.  Nonparametric Bayesian Matrix Factorization by Power-EP , 2010, AISTATS.

[23]  Yasuo Matsuyama,et al.  The -EM Algorithm: Surrogate Likelihood Maximization Using -Logarithmic Information Measures , 2001 .

[24]  Stan Z. Li,et al.  Learning spatially localized, parts-based representation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[25]  Nader Ebrahimi,et al.  INFORMATION FUNCTIONS FOR RELIABILITY , 2004 .

[26]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[27]  Kenji Kita,et al.  Dimensionality reduction using non-negative matrix factorization for information retrieval , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[28]  Andrzej Cichocki,et al.  Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[29]  Yingdong Zhao,et al.  Non-negative matrix factorization of gene expression profiles: a plug-in for BRB-ArrayTools , 2009, Bioinform..

[30]  Andrzej Cichocki,et al.  Hierarchical ALS Algorithms for Nonnegative Matrix and 3D Tensor Factorization , 2007, ICA.

[31]  Donato Malerba,et al.  A Further Comparison of Simplification Methods for Decision-Tree Induction , 1995, AISTATS.

[32]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[33]  Hagit Shatkay,et al.  Discovering semantic features in the literature: a foundation for building functional associations , 2006, BMC Bioinformatics.

[34]  J. Tukey,et al.  Transformations Related to the Angular and the Square Root , 1950 .

[35]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[36]  Andrzej Cichocki,et al.  Extended HALS algorithm for nonnegative Tucker decomposition and its applications for multiway analysis and classification , 2011, Neurocomputing.

[37]  Raul Kompass,et al.  A Generalized Divergence Measure for Nonnegative Matrix Factorization , 2007, Neural Computation.

[38]  Guoli Wang,et al.  LS-NMF: A modified non-negative matrix factorization algorithm utilizing uncertainty estimates , 2006, BMC Bioinformatics.

[39]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[40]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[41]  Oleg Okun,et al.  Fast Nonnegative Matrix Factorization and Its Application for Protein Fold Recognition , 2006, EURASIP J. Adv. Signal Process..

[42]  Nicolas Gillis,et al.  Using underapproximations for sparse nonnegative matrix factorization , 2009, Pattern Recognit..

[43]  Sven Behnke,et al.  Discovering hierarchical speech features using convolutional non-negative matrix factorization , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[44]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[45]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[46]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[47]  Chris H. Q. Ding,et al.  On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing , 2008, Comput. Stat. Data Anal..

[48]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[49]  Michael W. Berry,et al.  Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[50]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[51]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[52]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[53]  C. Seoighe,et al.  Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study. , 2012, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[54]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[55]  Tao Li,et al.  On the Equivalence Between Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing , .

[56]  A. Cichocki,et al.  Nonnegative matrix factorization with -divergence , 2008 .

[57]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[58]  Jonathan Foote,et al.  Summarizing video using non-negative similarity matrix factorization , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[59]  Juan Liu,et al.  A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules , 2011, Bioinform..

[60]  Jerzy Neyman,et al.  Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability : held at the Statistical Laboratory, Department of Mathematics, University of California, August 13-18, 1945, January 27-29, 1946 , 1949 .

[61]  Andrzej Cichocki,et al.  Csiszár's Divergences for Non-negative Matrix Factorization: Family of New Algorithms , 2006, ICA.

[62]  Jill P. Mesirov,et al.  A resampling-based method for class discovery and visualization of gene expression microarray data , 2003 .

[63]  K. Matusita On the estimation by the minimum distance method , 1953 .

[64]  Nader Ebrahimi,et al.  Class Discovery via Nonnegative Matrix Factorization , 2008 .

[65]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[66]  Ioannis Pitas,et al.  Application of non-negative and local non negative matrix factorization to facial expression recognition , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[67]  Dietrich Lehmann,et al.  Nonsmooth nonnegative matrix factorization (nsNMF) , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Margareta Holgersson,et al.  The limited value of cophenetic correlation as a clustering criterion , 1978, Pattern Recognit..

[69]  Michael W. Berry,et al.  Text Mining Using Non-Negative Matrix Factorizations , 2004, SDM.

[70]  Fei Wang,et al.  Efficient Nonnegative Matrix Factorization with Random Projections , 2010, SDM.

[71]  Donato Malerba,et al.  Multistrategy Learning for Document Recognition , 1994, Appl. Artif. Intell..

[72]  Wray L. Buntine Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[73]  Sergio Cruces,et al.  Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization , 2011, Entropy.

[74]  Baowen Xu,et al.  A constrained non-negative matrix factorization in information retrieval , 2003, Proceedings Fifth IEEE Workshop on Mobile Computing Systems and Applications.