Clustering time-stamped data using multiple nonnegative matrices factorization

Time-stamped data are ubiquitous in our daily life, such as twitter data, academic papers and sensor data. Finding clusters and their evolutionary trends in time-stamped data sets are receiving increasing attention from researchers. Most existing methods, however, can only tackle the clustering problem of a data set without time-stamped information which is inherent in almost all the data objects. Actually, not only the performance can be improved by effectively incorporating the time-stamped information in the clustering process on most data sets, but also we can find the evolutionary trends of the clusters with time information. In this paper, we introduce an approach for clustering time-stamped data and discovering the evolutionary trends of the clusters by using Multiple Nonnegative Matrices Factorization (MNMF) with smooth constraint over time. To utilize time-stamped information in the clustering process, an extra object-time matrix is constructed in our proposed method. Then, we jointly factorize multiple feature matrices using smooth constraint to perform the object-time matrix to obtain the clusters and their evolutionary trends. Experimental results on real data sets demonstrate that our proposed approach outperforms the comparative algorithms with respect to Fscore, NMI or Entropy.

[1]  Esteban Moro Egido,et al.  Affinity Paths and information diffusion in social networks , 2011, Soc. Networks.

[2]  Vikas Kawadia,et al.  Sequential detection of temporal communities by estrangement confinement , 2012, Scientific Reports.

[3]  Lung-Hao Lee,et al.  Near-synonym substitution using a discriminative vector space model , 2016, Knowl. Based Syst..

[4]  Jimeng Sun,et al.  MetaFac: community discovery via relational hypergraph factorization , 2009, KDD.

[5]  H. Feldmann,et al.  Rapid Nipah virus entry into the central nervous system of hamsters via the olfactory route , 2012, Scientific Reports.

[6]  Hadi Fanaee-T,et al.  Multi-aspect-streaming tensor analysis , 2015, Knowl. Based Syst..

[7]  Lawrence Carin,et al.  Hierarchical Bayesian Modeling of Topics in Time-Stamped Documents , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jianxin Feng,et al.  A novel single multiplicative neuron model trained by an improved glowworm swarm optimization algorithm for time series prediction , 2015, Knowl. Based Syst..

[9]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[10]  Christos Faloutsos,et al.  Fast mining and forecasting of complex time-stamped events , 2012, KDD.

[11]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[12]  Svetha Venkatesh,et al.  A matrix factorization framework for jointly analyzing multiple nonnegative data sources , 2011, SDM 2011.

[13]  Sherali Zeadally,et al.  Lifespan and propagation of information in On-line Social Networks: A case study based on Reddit , 2014, J. Netw. Comput. Appl..

[14]  Yun Chi,et al.  Facetnet: a framework for analyzing communities and their evolutions in dynamic networks , 2008, WWW.

[15]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[16]  Michael W. Berry,et al.  Nonnegative Matrix and Tensor Factorization for Discussion Tracking , 2009 .

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[19]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[20]  Chidchanok Lursinsap,et al.  Collaborator recommendation in interdisciplinary computer science using degrees of collaborative forces, temporal evolution of research interest, and comparative seniority status , 2015, Knowl. Based Syst..

[21]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[22]  Wei-Po Lee,et al.  Enhancing collaborative recommendation performance by combining user preference and trust-distrust propagation in social networks , 2016, Knowl. Based Syst..

[23]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[24]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[25]  Kian-Lee Tan,et al.  Temporal Spatial-Keyword Top-k publish/subscribe , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[26]  Gert R. G. Lanckriet,et al.  Leveraging Social Context for Modeling Topic Evolution , 2015, KDD.

[27]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[28]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[29]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[30]  Yunming Ye,et al.  DSKmeans: A new kmeans-type approach to discriminative subspace clustering , 2014, Knowl. Based Syst..

[31]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[32]  Aidong Zhang,et al.  Tracking Temporal Community Strength in Dynamic Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.