Targeted aspects oriented topic modeling for short texts

Topic modeling has demonstrated its value in short text topic discovery. For this task, a common way adopted by many topic models is to perform a full analysis to find all the possible topics. However, these topic models overlook the importance of deeper topics, leading to confusing topics discovered. In practice, people always tend to find more focused topics on some special aspects (or events), rather than a set of coarse topics. Therefore, in this paper, we propose a novel method, Targeted Aspects Oriented Topic Modeling (TATM), to discover more focused topics on specific aspects in short texts. Specifically, each short text is assigned to only one targeted aspect derived from an enhanced Dirichlet Multinomial Mixture process (E-DMM). This process helps group similar words as many as possible, which achieves topic homogeneity. In addition, TATM discovers the topics for each targeted aspect from as many angles as possible by performing target-level modeling, which achieves topic completeness. Thus, TATM can make a balance between the two conflicting properties without employing any additional information or pre-trained knowledge. The extensive experiments conducted on five real-world datasets demonstrate that our proposed model can effectively discover more focused and complete topics, and it outperforms the state-of-the-art baselines.

[1]  Svetha Venkatesh,et al.  Discovering topic structures of a temporally evolving document corpus , 2015, 1512.08008.

[2]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[3]  Yee Whye Teh,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.

[4]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[5]  Yan Zhang,et al.  User Based Aggregation for Biterm Topic Model , 2015, ACL.

[6]  Jiafeng Guo,et al.  BTM: Topic Modeling over Short Texts , 2014, IEEE Transactions on Knowledge and Data Engineering.

[7]  Joachim Bingel,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics , 2016 .

[8]  Shuai Wang,et al.  Targeted Topic Modeling for Focused Analysis , 2016, KDD.

[9]  Sinno Jialin Pan,et al.  Short and Sparse Text Topic Modeling via Self-Aggregation , 2015, IJCAI.

[10]  Dawei Song,et al.  A quantum-inspired sentiment representation model for twitter sentiment analysis , 2019, Applied Intelligence.

[11]  Yuan Zuo,et al.  Word network topic model: a simple but general solution for short and imbalanced texts , 2014, Knowledge and Information Systems.

[12]  Ryohei Hisano,et al.  Learning Topic Models by Neighborhood Aggregation , 2018, IJCAI.

[13]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[14]  Jianyong Wang,et al.  A dirichlet multinomial mixture model-based approach for short text clustering , 2014, KDD.

[15]  Jianhua Yin,et al.  A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization , 2016, KDD.

[16]  Md. Mustafizur Rahman,et al.  Hidden Topic Sentiment Model , 2016, WWW.

[17]  Steven Schockaert,et al.  Jointly Learning Word Embeddings and Latent Topics , 2017, SIGIR.

[18]  Jihong Ouyang,et al.  Two time-efficient gibbs sampling inference algorithms for biterm topic model , 2018, Applied Intelligence.

[19]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Xindong Wu,et al.  A Self-Adaptive Sliding Window Based Topic Model for Non-uniform Texts , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Giuseppe De Pietro,et al.  Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering , 2020, Inf. Sci..

[23]  Xindong Wu,et al.  ASTM: An Attentional Segmentation Based Topic Model for Short Texts , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[24]  Anísio Lacerda,et al.  Topic Modeling for Short Texts with Co-occurrence Frequency-Based Expansion , 2016, 2016 5th Brazilian Conference on Intelligent Systems (BRACIS).

[25]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[26]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[27]  Xiaohui Yan,et al.  A Probabilistic Model for Bursty Topic Discovery in Microblogs , 2015, AAAI.

[28]  Xindong Wu,et al.  Topic Modeling over Short Texts by Incorporating Word Embeddings , 2016, PAKDD.

[29]  Guan Yu,et al.  Document clustering via dirichlet process mixture model with feature selection , 2010, KDD.

[30]  Dragomir R. Radev,et al.  Effects of Creativity and Cluster Tightness on Short Text Clustering Performance , 2016, ACL.

[31]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[32]  Yaxin Bi,et al.  Aggregated topic models for increasing social media topic coherence , 2019, Applied Intelligence.

[33]  Hamido Fujita,et al.  Word Sense Disambiguation: A comprehensive knowledge exploitation framework , 2020, Knowl. Based Syst..

[34]  Kathleen M. Carley,et al.  Microblog Sentiment Topic Model , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[35]  Yue Lu,et al.  Latent aspect rating analysis without aspect keyword supervision , 2011, KDD.

[36]  Jun Zhang,et al.  Dirichlet Process Mixture Model for Document Clustering with Feature Partition , 2013, IEEE Transactions on Knowledge and Data Engineering.

[37]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[38]  Jihong Ouyang,et al.  Short text topic modeling by exploring original documents , 2017, Knowledge and Information Systems.

[39]  Fakhri Karray,et al.  Tools and approaches for topic detection from Twitter streams: survey , 2017, Knowledge and Information Systems.