Leveraging Social Context for Modeling Topic Evolution

Topic discovery and evolution (TDE) has been a problem which has gained long standing interest in the research community. The goal in topic discovery is to identify groups of keywords from large corpora so that the information in those corpora are summarized succinctly. The nature of text corpora has changed dramatically in the past few years with the advent of social media. Social media services allow users to constantly share, follow and comment on posts from other users. Hence, such services have given a new dimension to the traditional text corpus. The new dimension being that today's corpora have a social context embedded in them in terms of the community of users interested in a particular post, their profiles etc. We wish to harness this social context that comes along with the textual content for TDE. In particular, our goal is to both qualitatively and quantitatively analyze when social context actually helps with TDE. Methodologically, we approach the problem of TDE by a proposing non-negative matrix factorization (NMF) based model that incorporates both the textual information and social context information. We perform experiments on large scale real world dataset of news articles, and use Twitter as the platform providing information about the social context of these news articles. We compare with and outperform several state-of-the-art baselines. Our conclusion is that using the social context information is most useful when faced with topics that are particularly difficult to detect.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[3]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[6]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[8]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[9]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email , 2007, J. Artif. Intell. Res..

[10]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[11]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[12]  Matthew Hurst,et al.  Event Detection and Tracking in Social Streams , 2009, ICWSM.

[13]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[14]  Hila Becker,et al.  Event Identification in Social Media , 2009, WebDB.

[15]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[16]  Bart Bonikowski MDM , 2010 .

[17]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[18]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[19]  Noriaki Kawamae,et al.  Trend analysis model: trend consists of temporal words, topics, and timestamps , 2011, WSDM '11.

[20]  Eugene Agichtein,et al.  TM-LDA: efficient online modeling of latent topic transitions in social media , 2012, KDD.

[21]  Vikas Sindhwani,et al.  Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization , 2012, WSDM '12.

[22]  Ari Rappoport,et al.  What's in a hashtag?: content based prediction of the spread of ideas in microblogging communities , 2012, WSDM '12.

[23]  Min Xu,et al.  Representing documents through their readers , 2013, KDD.

[24]  Mounia Lalmas,et al.  Social media news communities: gatekeeping, coverage, and statement bias , 2013, CIKM.

[25]  Amin Mantrach,et al.  Item cold-start recommendations: learning local collective embeddings , 2014, RecSys '14.

[26]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Marco Saerens,et al.  A time-based collective factorization for topic discovery and monitoring in news , 2014, WWW.