A novel sentence embedding based topic detection method for micro-blog

Topic detection is a difficult challenging task, especially when the exact number of topics is unknown. In this article, we present a novel topic detection approach based on neural computing to detect topics in a microblogging dataset. We use an unsupervised neural sentence embedding model to map blogs to an embedding space. The proposed model is a weighted power mean sentence embedding model in which weights are calculated by a targeted attention mechanism. The experimental results show that our embedding model performs better than baseline in sentence clustering. In addition, we propose a clustering algorithm, referred to as Relationship-Aware DBSCAN (RADBSCAN), to discover topics from a microblogging dataset in which the number of topics is automatically determined by the characteristics of the dataset. Moreover, to provide parameter insensibility, we use the forwarding relationship in the blogs as a bridge of two independent clusters. Finally, we validate the proposed method on a dataset from the Sina microblog. The results show that our approach can detect all topics successfully and can extract the keywords of each topic.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Zhi Lu,et al.  Short text clustering by finding core terms , 2011, Knowledge and Information Systems.

[3]  Qun Jin,et al.  Analysis of User Network and Correlation for Community Discovery Based on Topic-Aware Similarity and Behavioral Influence , 2018, IEEE Transactions on Human-Machine Systems.

[4]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[5]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[6]  Bo Jiang,et al.  Topic Modeling for Short Texts via Word Embedding and Document Correlation , 2020, IEEE Access.

[7]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[8]  Philip Resnik,et al.  A Discriminative Topic Model using Document Network Structure , 2016, ACL.

[9]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[10]  Greg Ver Steeg,et al.  Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge , 2016, TACL.

[11]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[12]  Shuiqiao Yang,et al.  Discovering Topic Representative Terms for Short Text Clustering , 2019, IEEE Access.

[13]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Michael R. Lyu,et al.  SoRec: social recommendation using probabilistic matrix factorization , 2008, CIKM '08.

[16]  Erik Cambria,et al.  Sentic LSTM: a Hybrid Network for Targeted Aspect-Based Sentiment Analysis , 2018, Cognitive Computation.

[17]  Feng Chen,et al.  From Twitter to detector: real-time traffic incident detection using social media data , 2016 .

[18]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[19]  Yaohui Jin,et al.  A Generalized Recurrent Neural Architecture for Text Classification with Multi-Task Learning , 2017, IJCAI.

[20]  Ye Zhang,et al.  A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.

[21]  Yanmei Zhang,et al.  A Novel Hybrid Clustering Algorithm for Topic Detection on Chinese Microblogging , 2019, IEEE Transactions on Computational Social Systems.

[22]  Hwee Tou Ng,et al.  An Unsupervised Neural Attention Model for Aspect Extraction , 2017, ACL.

[23]  Nergiz Ercil Cagiltay,et al.  Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling , 2019, IEEE Access.

[24]  Min Yang,et al.  Investigating Capsule Networks with Dynamic Routing for Text Classification , 2018, EMNLP.

[25]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[26]  Jianyong Wang,et al.  A dirichlet multinomial mixture model-based approach for short text clustering , 2014, KDD.

[27]  Fang Kong,et al.  Topic Tensor Network for Implicit Discourse Relation Recognition in Chinese , 2019, ACL.

[28]  Snigdha Chaturvedi,et al.  Feuding Families and Former Friends: Unsupervised Learning for Dynamic Fictional Relationships , 2016, NAACL.

[29]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[30]  Michalis Vazirgiannis,et al.  Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification , 2018, TextGraphs@NAACL-HLT.

[31]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[32]  Bo Hu,et al.  An Improved Single-Pass Algorithm for Chinese Microblog Topic Detection and Tracking , 2016, 2016 IEEE International Congress on Big Data (BigData Congress).

[33]  Iryna Gurevych,et al.  Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations , 2018, 1803.01400.

[34]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[35]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[36]  Iryna Gurevych,et al.  Classification and Clustering of Arguments with Contextualized Word Embeddings , 2019, ACL.

[37]  Kevin I-Kai Wang,et al.  Multi-Modality Behavioral Influence Analysis for Personalized Recommendations in Health Social Media Environment , 2019, IEEE Transactions on Computational Social Systems.

[38]  Ge Yu,et al.  Multimodal learning for topic sentiment analysis in microblogging , 2017, Neurocomputing.

[39]  Naoaki Okazaki,et al.  Other Topics You May Also Agree or Disagree: Modeling Inter-Topic Preferences using Tweets and Matrix Factorization , 2017, ACL.

[40]  Qun Jin,et al.  Academic Influence Aware and Multidimensional Network Analysis for Research Collaboration Navigation Based on Scholarly Big Data , 2021, IEEE Transactions on Emerging Topics in Computing.

[41]  Jian Yu,et al.  Concept decompositions for short text clustering by identifying word communities , 2018, Pattern Recognit..

[42]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[43]  Honglak Lee,et al.  An efficient framework for learning sentence representations , 2018, ICLR.

[44]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[45]  Georgios Balikas,et al.  Topical Coherence in LDA-based Models through Induced Segmentation , 2017, ACL.

[46]  Yidong Chen,et al.  Deep Semantic Role Labeling with Self-Attention , 2017, AAAI.

[47]  Naoya Takeishi,et al.  Dynamic and Static Topic Model for Analyzing Time-Series Document Collections , 2018, ACL.

[48]  Yang Li,et al.  Learning document representation via topic-enhanced LSTM model , 2019, Knowl. Based Syst..

[49]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[50]  Tinghuai Ma,et al.  Natural disaster topic extraction in Sina microblogging based on graph analysis , 2019, Expert Syst. Appl..

[51]  Xiao Wang,et al.  Detecting Traffic Information From Social Media Texts With Deep Learning Approaches , 2018, IEEE Transactions on Intelligent Transportation Systems.

[52]  Xing Xie,et al.  Neural News Recommendation with Topic-Aware News Representation , 2019, ACL.

[53]  Martin Ester,et al.  A matrix factorization technique with trust propagation for recommendation in social networks , 2010, RecSys '10.

[54]  Aixin Sun,et al.  Topic Modeling for Short Texts with Auxiliary Word Embeddings , 2016, SIGIR.

[55]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[56]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[57]  Li Yun,et al.  Short Text Topic Modeling Techniques, Applications, and Performance: A Survey , 2019, IEEE Transactions on Knowledge and Data Engineering.

[58]  Paul J. Kennedy,et al.  An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit , 2020, Inf. Process. Manag..

[59]  Walid Magdy,et al.  Unsupervised adaptive microblog filtering for broad dynamic topics , 2016, Inf. Process. Manag..

[60]  Jianfeng Gao,et al.  Deep Learning Based Text Classification: A Comprehensive Review , 2020, ArXiv.

[61]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[62]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[63]  Peng Jia,et al.  Forwarding Behavior Prediction Based on Microblog User Features , 2020, IEEE Access.

[64]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.