Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization

As massive repositories of real-time human commentary, social media platforms have arguably evolved far beyond passive facilitation of online social interactions. Rapid analysis of information content in online social media streams (news articles, blogs,tweets etc.) is the need of the hour as it allows business and government bodies to understand public opinion about products and policies. In most of these settings, data points appear as a stream of high dimensional feature vectors. Guided by real-world industrial deployment scenarios, we revisit the problem of online learning of topics from streaming social media content. On one hand, the topics need to be dynamically adapted to the statistics of incoming datapoints, and on the other hand, early detection of rising new trends is important in many applications. We propose an online nonnegative matrix factorizations framework to capture the evolution and emergence of themes in unstructured text under a novel temporal regularization framework. We develop scalable optimization algorithms for our framework, propose a new set of evaluation metrics, and report promising empirical results on traditional TDT tasks as well as streaming Twitter data. Our system is able to rapidly capture emerging themes, track existing topics over time while maintaining temporal consistency and continuity in user views, and can be explicitly configured to bound the amount of information being presented to the user.

[1]  穂鷹 良介 Non-Linear Programming の計算法について , 1963 .

[2]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[3]  Panos M. Pardalos,et al.  An algorithm for a singly constrained class of quadratic programs subject to upper and lower bounds , 1990, Math. Program..

[4]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[5]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[6]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[7]  James Allan,et al.  UMass at TDT 2000 , 2000 .

[8]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[9]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[10]  Ata Kabán,et al.  On an equivalence between PLSI and LDA , 2003, SIGIR.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  James Allan,et al.  UMass at TDT 2004 , 2004 .

[13]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[14]  Meng Chang Chen,et al.  Using Incremental PLSI for Threshold-Resilient Online Event Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[15]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[16]  Paul Van Dooren,et al.  Descent methods for Nonnegative Matrix Factorization , 2008, ArXiv.

[17]  Chris H. Q. Ding,et al.  On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing , 2008, Comput. Stat. Data Anal..

[18]  Prem Melville Social Media Analytics: Channeling the Power of the Blogosphere for Marketing Insight , 2009 .

[19]  Myra Spiliopoulou,et al.  Topic Evolution in a Stream of Documents , 2009, SDM.

[20]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[21]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[22]  Martin Jaggi,et al.  A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[23]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[24]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[25]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[26]  Feng Qianjin,et al.  Projected gradient methods for Non-negative Matrix Factorization based relevance feedback algorithm in medical image retrieval , 2011 .