An Adaptive Dirichlet Multinomial Mixture Model for Short Text Streaming Clustering

In this paper, we propose an adaptive Dirichlet Multinomial Mixture model for short text clustering along the time slices. A hyperparameters adjusting algorithm is utilized to capture the temporal dynamics automatically, and a collapsed Gibbs sampling algorithm for the extended Dirichlet Multinomial Mixture (DMM) model (e-GSDMM algorithm), is proposed to infer the changes of topic and word distributions along the time slices. Our extensive experiments over three different datasets show that the proposed model is efficient and performs better than the existing GSDMM approach for short text clustering on the streaming data.

[1]  Evangelos Kanoulas,et al.  Dynamic Clustering of Streaming Short Documents , 2016, KDD.

[2]  Jianhua Yin,et al.  A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization , 2016, KDD.

[3]  P. Grambsch Survival and Event History Analysis: A Process Point of View by AALEN, O. O., BORGAN, O., and GJESSING, H. K. , 2009 .

[4]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[5]  Eric P. Xing,et al.  Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering , 2008, SDM.

[6]  Eugene Agichtein,et al.  TM-LDA: efficient online modeling of latent topic transitions in social media , 2012, KDD.

[7]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[8]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[9]  Chandler May,et al.  Particle Filter Rejuvenation and Latent Dirichlet Allocation , 2014, ACL.

[10]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[11]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[12]  Jiafeng Guo,et al.  BTM: Topic Modeling over Short Texts , 2014, IEEE Transactions on Knowledge and Data Engineering.

[13]  Le Song,et al.  Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams , 2015, KDD.

[14]  Jimeng Sun,et al.  Dynamic Mixture Models for Multiple Time-Series , 2007, IJCAI.

[15]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Zhiguo Gong,et al.  A Density-based Nonparametric Model for Online Event Discovery from the Social Media Data , 2017, IJCAI.

[18]  Yasushi Sakurai,et al.  Online multiscale dynamic topic models , 2010, KDD.

[19]  Naonori Ueda,et al.  Topic Tracking Model for Analyzing Consumer Purchase Behavior , 2009, IJCAI.

[20]  Massih-Reza Amini,et al.  Streaming-LDA: A Copula-based Approach to Modeling Topic Dependencies in Document Streams , 2016, KDD.

[21]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[22]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Himabindu Lakkaraju,et al.  Dynamic Multi-relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media , 2012, 2012 IEEE 12th International Conference on Data Mining.

[24]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[25]  Jianyong Wang,et al.  A dirichlet multinomial mixture model-based approach for short text clustering , 2014, KDD.

[26]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[27]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[28]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .