Joint factor analysis and latent clustering

Many real-life datasets exhibit structure in the form of physically meaningful clusters - e.g., news documents can be categorized as sports, politics, entertainment, and so on. Taking these clusters into account together with low-rank structure may yield parsimonious matrix and tensor factorization models and more powerful data analytics. Prior works made use of data-domain similarity to improve nonnegative matrix factorization. Here we are instead interested in joint low-rank factorization and latent-domain clustering; that is, in clustering the latent reduced-dimension representations of the observed entities. A unified algorithmic framework that can deal with both matrix and tensor factorization and latent clustering is proposed. Numerical results obtained from synthetic and real document data show that the proposed approach can significantly improve factor analysis and clustering accuracy.

[1]  Jiawei Han,et al.  Locally Consistent Concept Factorization for Document Clustering , 2011, IEEE Transactions on Knowledge and Data Engineering.

[2]  N. Sidiropoulos,et al.  On the uniqueness of multilinear decomposition of N‐way arrays , 2000 .

[3]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .

[4]  Seung-Jean Kim,et al.  Hyperspectral Image Unmixing via Alternating Projected Subgradients , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.

[5]  Thomas B. Schön,et al.  2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, CAMSAP 2015 , 2016 .

[6]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[7]  Tamara G. Kolda,et al.  Temporal Analysis of Social Networks using Three-way DEDICOM , 2006 .

[8]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[9]  Yihong Gong,et al.  Document clustering by concept factorization , 2004, SIGIR '04.

[10]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[11]  Nikos D. Sidiropoulos,et al.  Non-Negative Matrix Factorization Revisited: Uniqueness and Algorithm for Symmetric Decomposition , 2014, IEEE Transactions on Signal Processing.

[12]  Nikos D. Sidiropoulos,et al.  From K-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors , 2013, IEEE Transactions on Signal Processing.

[13]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[14]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  R. Bro PARAFAC. Tutorial and applications , 1997 .