A Metamodel Enabled Approach for Discovery of Coherent Topics in Short Text Microblogs

Comprehending social media discussions in short text microblogs is fundamental for knowledge-based applications like recommender systems. Twitter, for example, provides rich real-time information in keeping with its streaming nature. Making sense of such data without automated support is not feasible due to its vast size and nature. The problem becomes more complex when the data in question have a low variance in terms of topical diversity. Therefore, an automatic method for understanding textual patterns in such topically constrained data needs to be developed. A major challenge to building such a system is in its ability to comprehend the nature of the data with regard to diversity of word structure correlations, vocabulary sparsity, and distinguishing factors in the generated topics. In this paper, we present a novel semi-supervised approach called metamodel enabled latent Dirichlet allocation to address this challenge. Compared to state-of-the-art approaches, our model incorporates a domain-specific metamodel. The metamodel is defined as a set of topic label vectors derived from long texts to guide the learning process in shorter texts.

[1]  Changjun Jiang,et al.  Mining Coherent Topics With Pre-Learned Interest Knowledge in Twitter , 2017, IEEE Access.

[2]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[3]  Yang Song,et al.  Topical Keyphrase Extraction from Twitter , 2011, ACL.

[4]  A. Banerjee,et al.  Social Topic Models for Community Extraction , 2008 .

[5]  Ping Chen,et al.  Extended Topic Model for Word Dependency , 2015, ACL.

[6]  Hal Daumé,et al.  Incorporating Lexical Priors into Topic Models , 2012, EACL.

[7]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[8]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[9]  Carlos Gershenson,et al.  From neuroscience to computer science: a topical approach on Twitter , 2017, Journal of Computational Social Science.

[10]  Arjun Mukherjee,et al.  Leveraging Multi-Domain Prior Knowledge in Topic Models , 2013, IJCAI.

[11]  Changjun Jiang,et al.  Discovering Canonical Correlations between Topical and Topological Information in Document Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[12]  Changjun Jiang,et al.  Modeling Document Networks with Tree-Averaged Copula Regularization , 2017, WSDM.

[13]  Justin Zhijun Zhan,et al.  Using deep learning for short text understanding , 2017, Journal of Big Data.

[14]  Arjun Mukherjee,et al.  Discovering coherent topics using general knowledge , 2013, CIKM.

[15]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[16]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[17]  Changjun Jiang,et al.  Multi-perspective Hierarchical Dirichlet Process for Geographical Topic Modeling , 2017, PAKDD.

[18]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[19]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Xiaolong Wang,et al.  Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach , 2011, CIKM '11.

[21]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[22]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[23]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[24]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[25]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[26]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[27]  Shuang-Hong Yang,et al.  Large-scale high-precision topic modeling on twitter , 2014, KDD.

[28]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.