Embed2Detect: temporally clustered embedded words for event detection in social media

Event detection in social media refers to automatic identification of important information shared in social media platforms on a certain time. Considering the dynamic nature and high volume of data production in data streams, it is impractical to filter the events manually. Therefore, it is important to have an automated mechanism to detect events in order to utilise social media data effectively. Analysing the available literature, most of the existing event detection methods are only focused on statistical and syntactical features in data, even though the underlying semantics are also important for an effective information retrieval from text, because they describe the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in prediction-based word embeddings and hierarchical agglomerative clustering. The adoption of prediction-based word embeddings incorporates the semantical features in the text to overcome a major limitation available with previous approaches. This method is experimented on two recent social media data sets which represent the sports and politics domains. The results obtained from the experiments reveal that our approach is capable of effective and efficient event detection with the proof of significant improvements over baselines. For sports data set, Embed2Detect achieved 30% higher F-measure than the best performed baseline method and for political data set, it was an increase by 36%.

[1]  Chao Zhang,et al.  BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision , 2020, KDD.

[2]  Rizal Setya Perdana What is Twitter , 2013 .

[3]  Marieke van Erp,et al.  Automatic Extraction of Soccer Game Events from Twitter , 2012, DeRiVE@ISWC.

[4]  Jeongkyu Lee,et al.  Event detection on large social media using temporal analysis , 2017, 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC).

[5]  Qiang Qu,et al.  Cross-domain aspect/sentiment-aware abstractive review summarization by combining topic modeling and deep reinforcement learning , 2018, Neural Computing and Applications.

[6]  Wenji Mao,et al.  Online event detection and tracking in social media based on neural similarity metric learning , 2017, 2017 IEEE International Conference on Intelligence and Security Informatics (ISI).

[7]  David Mimno,et al.  Evaluating the Stability of Embedding-based Word Similarities , 2018, TACL.

[8]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[9]  Benjamin J. Wilson,et al.  Measuring Word Significance using Distributed Representations of Words , 2015, ArXiv.

[10]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[11]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[12]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[13]  Elena Cabrio,et al.  Graph-based Event Extraction from Twitter , 2017, RANLP.

[14]  Jon Atle Gulla,et al.  Dynamic attention-integrated neural network for session-based news recommendation , 2019, Machine Learning.

[15]  Matthew Hurst,et al.  Event Detection and Tracking in Social Streams , 2009, ICWSM.

[16]  Jeffrey A. Gottfried,et al.  News use across social media platforms 2016 , 2016 .

[17]  Chau Vo,et al.  Hot Topic Detection on Twitter Data Streams with Incremental Clustering Using Named Entities and Central Centroids , 2019, 2019 IEEE-RIVF International Conference on Computing and Communication Technologies (RIVF).

[18]  Mehmet A. Orgun,et al.  Real-time event detection from the Twitter data stream using the TwitterNews+ Framework , 2019, Inf. Process. Manag..

[19]  Yiannis Kompatsiaris,et al.  Multimodal Graph-based Event Detection and Summarization in Social Media Streams , 2015, ACM Multimedia.

[20]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[21]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[22]  Kamalakar Karlapalem,et al.  ET: events from tweets , 2013, WWW.

[23]  Yiannis Kompatsiaris,et al.  Sensing Trending Topics in Twitter , 2013, IEEE Transactions on Multimedia.

[24]  Guandong Xu,et al.  Event Detection in Twitter Stream using Weighted Dynamic Heartbeat Graph Approach , 2019, IEEE Comput. Intell. Mag..

[25]  Cheong Hee Park,et al.  Emerging topic detection in twitter stream based on high utility pattern mining , 2019, Expert Syst. Appl..

[26]  Jugal K. Kalita,et al.  Streaming trend detection in Twitter , 2013, Int. J. Web Based Communities.

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28]  Joemon M. Jose,et al.  Building a large-scale corpus for evaluating event detection on twitter , 2013, CIKM.

[29]  Xiaomo Liu,et al.  Real-Time Novel Event Detection from Social Media , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[30]  Ivan A. Sag,et al.  Information-based syntax and semantics , 1987 .

[31]  Nada Lavrac,et al.  Embedding-based Silhouette community detection , 2019, Machine Learning.

[32]  Mohamed Medhat Gaber,et al.  TRCM: A Methodology for Temporal Analysis of Evolving Concepts in Twitter , 2013, ICAISC.

[33]  Mohamed Medhat Gaber,et al.  A rule dynamics approach to event detection in Twitter with its application to sports and politics , 2016, Expert Syst. Appl..

[34]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[35]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[36]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[37]  Sinan Toklu,et al.  A deep learning analysis on question classification task using Word2vec representations , 2020, Neural Computing and Applications.

[38]  Carmela Comito,et al.  Word Embedding based Clustering to Detect Topics in Social Media , 2019, 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI).

[39]  Carlos J. Martín-Dancausa,et al.  Spot the Ball: Detecting Sports Events on Twitter , 2014, ECIR.

[40]  Craig MacDonald,et al.  Scalable distributed event detection for Twitter , 2013, 2013 IEEE International Conference on Big Data.

[41]  Mehmet A. Orgun,et al.  A survey on real-time event detection from the Twitter data stream , 2018, J. Inf. Sci..

[42]  Sharon G. Small,et al.  Review of information extraction technologies and applications , 2013, Neural Computing and Applications.

[43]  Maurice Roux,et al.  A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms , 2018, Journal of Classification.

[44]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[45]  Pauray S. M. Tsai,et al.  Mining frequent itemsets in data streams using the weighted sliding window model , 2009, Expert Syst. Appl..

[46]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[47]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[48]  Cécile Favre,et al.  Event detection, tracking, and visualization in Twitter: a mention-anomaly-based approach , 2015, Social Network Analysis and Mining.

[49]  Nora Alkhamees,et al.  Event detection from social network streams using frequent pattern mining with dynamic support values , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[50]  Michael Grossniklaus,et al.  Event Identification and Tracking in Social Media Streaming Data , 2014, EDBT/ICDT Workshops.

[51]  Hendri Murfi,et al.  Combination of singular value decomposition and K-means clustering methods for topic detection on Twitter , 2015, 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS).

[52]  Ravindra Kumar,et al.  Aspect-based sentiment analysis using deep networks and stochastic optimization , 2019, Neural Computing and Applications.

[53]  Daniel Müllner,et al.  Modern hierarchical, agglomerative clustering algorithms , 2011, ArXiv.

[54]  Carmela Comito,et al.  Bursty Event Detection in Twitter Streams , 2019, ACM Trans. Knowl. Discov. Data.

[55]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.