A multi-feature probabilistic graphical model for social network semantic search

Abstract With the rapid development of social network platforms, more and more people are using them to search for material related to their interests. As the texts of social media messages are usually so short, when traditional existing document modeling methods are used in social network search tasks, the problem of semantic sparsity arises, leading to low-quality semantic representation and low-precision social network search results. Fortunately, besides of short text, social media data also has other features, such as timestamps, locations, and its user information. In light of this, to realize precise social network search, we propose a multi-feature probabilistic graphical model (MFPGM), which can generate high-quality semantic representation. To deal with the problem of semantic sparsity, we exploit two strategies in MFPGM. First, we propose a concept named special region and utilize location information to aggregate short text into long text. Second, we introduce the biterm pattern that can generate dense semantic space by supposing that a biterm occurring in the same context has the same topic. In order to generate high-quality semantic representations, we simultaneously model multiple features (i.e., biterm, user, location and timestamp) of social network data to enhance the semantic learning process of MFPGM. We conduct a lot of experiments on real-word datasets, and the comparisons with several state-of-art baseline methods have demonstrated the superiority of our MFPGM on topic quality and search performance. Additionally, with the help of the generated semantic representations, MFPGM allows people to analyze the relationships between time and the popularities of topics.

[1]  SangKeun Lee,et al.  Hashtag-based topic evolution in social media , 2017, World Wide Web.

[2]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[3]  Zi Huang,et al.  What are Popular: Exploring Twitter Features for Event Detection, Tracking and Visualization , 2015, ACM Multimedia.

[4]  Wei-Ying Ma,et al.  Hashtag-Based Sub-Event Discovery Using Mutually Generative LDA in Twitter , 2016, AAAI.

[5]  Anísio Lacerda,et al.  Topic Modeling for Short Texts with Co-occurrence Frequency-Based Expansion , 2016, 2016 5th Brazilian Conference on Intelligent Systems (BRACIS).

[6]  In-Chan Choi,et al.  Generalized ensemble model for document ranking in information retrieval , 2015, Comput. Sci. Inf. Syst..

[7]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[8]  Philip S. Yu,et al.  A topic model for co-occurring normal documents and short texts , 2018, World Wide Web.

[9]  Kun Yang,et al.  Dynamic non-parametric joint sentiment topic mixture model , 2015, Knowl. Based Syst..

[10]  Lei Chen,et al.  Event detection over twitter social media streams , 2013, The VLDB Journal.

[11]  Anísio Lacerda,et al.  A general framework to expand short text for topic modeling , 2017, Inf. Sci..

[12]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[13]  Meina Song,et al.  Context-aware probabilistic matrix factorization modeling for point-of-interest recommendation , 2017, Neurocomputing.

[14]  Hai Jin,et al.  Future Generation Computer Systems , 2022 .

[15]  M. de Rijke,et al.  Explainable User Clustering in Short Text Streams , 2016, SIGIR.

[16]  Xiuzhen Zhang,et al.  A probabilistic method for emerging topic tracking in Microblog stream , 2016, World Wide Web.

[17]  Scott Sanner,et al.  Improving LDA topic models for microblogs via tweet pooling and automatic labeling , 2013, SIGIR.

[18]  Weizhi Nie,et al.  3D object retrieval based on Spatial+LDA model , 2015, Multimedia Tools and Applications.

[19]  Aixin Sun,et al.  Topic Modeling for Short Texts with Auxiliary Word Embeddings , 2016, SIGIR.

[20]  Yalou Huang,et al.  Hashtag Graph Based Topic Model for Tweet Mining , 2014, 2014 IEEE International Conference on Data Mining.

[21]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[22]  Ming Xu,et al.  Biterm Pseudo Document Topic Model for Short Text , 2016, 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI).

[23]  Changjun Hu,et al.  Predicting the popularity of viral topics based on time series forecasting , 2016, Neurocomputing.

[24]  Yuan Zuo,et al.  Word network topic model: a simple but general solution for short and imbalanced texts , 2014, Knowledge and Information Systems.

[25]  Pengfei Wang,et al.  An algorithm for event detection based on social media data , 2017, Neurocomputing.

[26]  Xiangjian He,et al.  User relationship strength modeling for friend recommendation on Instagram , 2017, Neurocomputing.

[27]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[28]  Chunfeng Yuan,et al.  Multi-feature max-margin hierarchical Bayesian model for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jiafeng Guo,et al.  BTM: Topic Modeling over Short Texts , 2014, IEEE Transactions on Knowledge and Data Engineering.

[30]  Jingkuan Song,et al.  Real-time social media retrieval with spatial, temporal and social constraints , 2017, Neurocomputing.

[31]  In-Chan Choi,et al.  Indexing by Latent Dirichlet Allocation and an Ensemble Model , 2013, J. Assoc. Inf. Sci. Technol..

[32]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[33]  Yalou Huang,et al.  Using Hashtag Graph-Based Topic Model to Connect Semantically-Related Words Without Co-Occurrence in Microblogs , 2016, IEEE Transactions on Knowledge and Data Engineering.

[34]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[35]  Heyan Huang,et al.  Query Expansion Based on a Feedback Concept Model for Microblog Retrieval , 2017, WWW.

[36]  Ge Yu,et al.  Multimodal learning for topic sentiment analysis in microblogging , 2017, Neurocomputing.

[37]  Tao Chen,et al.  VELDA: Relating an Image Tweet's Text and Images , 2015, AAAI.

[38]  Hui Xiong,et al.  Topic Modeling of Short Texts: A Pseudo-Document View , 2016, KDD.

[39]  Chu-Song Chen,et al.  Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Baogang Wei,et al.  Short Text Understanding by Leveraging Knowledge into Topic Model , 2015, NAACL.

[41]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[42]  Bin Zhou,et al.  User interest mining via tags and bidirectional interactions on Sina Weibo , 2017, World Wide Web.