Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering

Clustering is an important data mining task for exploration and visualization of different data types like news stories, scientific publications, weblogs, etc. Due to the evolving nature of these data, evolutionary clustering, also known as dynamic clustering, has recently emerged to cope with the challenges of mining temporally smooth clusters over time. A good evolutionary clustering algorithm should be able to fit the data well at each time epoch, and at the same time results in a smooth cluster evolution that provides the data analyst with a coherent and easily interpretable model. In this paper we introduce the temporal Dirichlet process mixture model (TDPM) as a framework for evolutionary clustering. TDPM is a generalization of the DPM framework for clustering that automatically grows the number of clusters with the data. In our framework, the data is divided into epochs; all data points inside the same epoch are assumed to be fully exchangeable, whereas the temporal order is maintained across epochs. Moreover, The number of clusters in each epoch is unbounded: the clusters can retain, die out or emerge over time, and the actual parameterization of each cluster can also evolve over time in a Markovian fashion. We give a detailed and intuitive construction of this framework using the recurrent Chinese restaurant process (RCRP) metaphor, as well as a Gibbs sampling algorithm to carry out posterior inference in order to determine the optimal cluster evolution. We demonstrate our model over simulated data by using it to build an infinite dynamic mixture of Gaussian factors, and over real dataset by using it to build a simple non-parametric dynamic clustering-topic model and apply it to analyze the NIPS12 document collection.

[1]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[2]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[3]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[4]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[5]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[6]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[7]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[8]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[9]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[10]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Yifan Li,et al.  Clustering moving objects , 2004, KDD.

[14]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[15]  J. Lafferty,et al.  Time-Sensitive Dirichlet Process Mixture Models , 2005 .

[16]  S. Roweis,et al.  Time-Varying Topic Models using Dependent Dirichlet Processes , 2005 .

[17]  J. E. Griffin,et al.  Order-Based Dependent Dirichlet Processes , 2006 .

[18]  Michael I. Jordan,et al.  Nonparametric empirical Bayes for the Dirichlet process mixture model , 2006, Stat. Comput..

[19]  Wei Li,et al.  A Continuous-Time Model of Topic Co-occurrence Trends , 2006 .

[20]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[21]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[22]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[23]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[24]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[25]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[26]  E. Xing Dynamic Nonparametric Bayesian Models And the Birth-Death Process , 2006 .

[27]  Hal Daumé,et al.  Fast search for Dirichlet process mixture models , 2007, AISTATS.

[28]  Yee Whye Teh,et al.  Collapsed Variational Dirichlet Process Mixture Models , 2007, IJCAI.

[29]  Yun Chi,et al.  Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.

[30]  Lizhu Zhou,et al.  Mining Naturally Smooth Evolution of Clusters from Dynamic Data , 2007, SDM.

[31]  Yihong Gong,et al.  Incremental Spectral Clustering With Application to Monitoring of Evolving Blog Communities , 2007, SDM.

[32]  E. Xing,et al.  Dynamic Non-Parametric Mixture Models and The Recurrent Chinese Restaurant Process a , 2008 .