Evolutionary Clustering by Hierarchical Dirichlet Process with Hidden Markov State

This paper studies evolutionary clustering, which is a recently hot topic with many important applications, noticeably in social network analysis. In this paper, based on the recent literature on Hierarchical Dirichlet Process (HDP) and Hidden Markov Model (HMM), we have developed a statistical model HDP-HTM that combines HDP with a Hierarchical Transition Matrix (HTM) based on the proposed Infinite Hierarchical Hidden Markov State model (iH2MS) as an effective solution to this problem. The HDP-HTM model substantially advances the literature on evolutionary clustering in the sense that not only it performs better than the existing literature, but more importantly it is capable of automatically learning the cluster numbers and structures and at the same time explicitly addresses the correspondence issue during the evolution. Extensive evaluations have demonstrated the effectiveness and promise of this solution against the state-of-the-art literature.

[1]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[2]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[3]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[4]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[5]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[7]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[8]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[9]  Yun Chi,et al.  Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.

[10]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[11]  David B. Dunson,et al.  Multi-task learning for sequential data via iHMMs and the nested Dirichlet process , 2007, ICML '07.

[12]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[15]  Huan Liu,et al.  Community evolution in dynamic multi-mode networks , 2008, KDD.

[16]  Michael I. Jordan,et al.  Developing a tempered HDP-HMM for Systems with State Persistence , 2007 .

[17]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[18]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[19]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[20]  S. MacEachern,et al.  Bayesian Density Estimation and Inference Using Mixtures , 2007 .

[21]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[22]  Meng-Chang Lee Top 100 Documents Browse Search Ieee Xplore Guide Support Top 100 Documents Accessed: Nov 2005 a Tutorial on Hidden Markov Models and Selected Applications Inspeech Recognition , 2005 .

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[25]  Philip S. Yu,et al.  Dirichlet Process Based Evolutionary Clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[26]  D. Aldous Exchangeability and related topics , 1985 .

[27]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[28]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.