Identifying Topical Shifts in Twitter Streams: An Integration of Non-negative Matrix Factorisation, Sentiment Analysis and Structural Break Models for Large Scale Data

We propose an integration of Non-negative Matrix Factorisation, Sentiment analysis and Structural Break Models to identify significant topical shifts on the social media platform Twitter. For the topic modelling, we compare Latent Dirichlet Allocation and Non-negative Matrix Factorization in terms of their applicability to short text documents. The extraction of sentiment is done by the rule-based VADER model. Structural breaks in the relative frequency and daily sentiments of topics over time are identified with the Bai-Perron model. Combining these methods, we provide a valuable and easy to use exploratory tool for social scientists to study the discourse on Twitter over time. Detecting statistically significant shifts in topics over time enables researchers to perform statistical inference and test hypotheses about the discourse on Twitter. The framework is implemented efficiently to ensure that it can be used on average consumer hardware in a reasonable amount of time. A case study with COVID-19 related tweets in the UK is provided. Our method is validated by linking the topical shifts to real world events by the use of the timestamps of the COVID-19 related tweets. © 2021, Springer Nature Switzerland AG.

[1]  Sotiris Ioannidis,et al.  A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks , 2021, Expert Syst. Appl..

[2]  P. Perron,et al.  Estimating and testing linear models with multiple structural changes , 1995 .

[3]  Kurt Hornik,et al.  Testing and dating of structural changes in practice , 2003, Comput. Stat. Data Anal..

[4]  Lei Chen,et al.  Event detection over twitter social media streams , 2013, The VLDB Journal.

[5]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[6]  JungHwan Yang,et al.  Political Astroturfing on Twitter: How to Coordinate a Disinformation Campaign , 2020, Political Communication.

[7]  P. Perron,et al.  Computation and Analysis of Multiple Structural-Change Models , 1998 .

[8]  Fabio Crestani,et al.  Like It or Not , 2016, ACM Comput. Surv..

[9]  Mark Lycett,et al.  Identifying patient experience from online resources via sentiment analysis and topic modelling , 2016, BDCAT.

[10]  Raquel Recuero,et al.  Influencers in Polarized Political Networks on Twitter , 2018, SMSociety.

[11]  Mohamed Medhat Gaber,et al.  A rule dynamics approach to event detection in Twitter with its application to sports and politics , 2016, Expert Syst. Appl..

[12]  M. Siegrist,et al.  The Impact of Trust and Risk Perception on the Acceptance of Measures to Reduce COVID‐19 Cases , 2021, Risk analysis : an official publication of the Society for Risk Analysis.

[13]  Roberto V. Zicari,et al.  PoliTwi: Early detection of emerging political topics on twitter and the impact on concept-level sentiment analysis , 2014, Knowl. Based Syst..

[14]  Christos Boutsidis,et al.  SVD based initialization: A head start for nonnegative matrix factorization , 2008, Pattern Recognit..

[15]  Hongtao Lu,et al.  Non-negative and sparse spectral clustering , 2014, Pattern Recognit..

[16]  M. Siegrist,et al.  The role of public trust during pandemics: Implications for crisis communication. , 2014 .

[17]  Victor V. Kryssanov,et al.  Topic Modelling for Aspect-Level Sentiment Analysis , 2019 .

[18]  Hsinchun Chen,et al.  The State-of-the-Art in Twitter Sentiment Analysis , 2018, ACM Trans. Manag. Inf. Syst..

[19]  Jianyong Wang,et al.  A dirichlet multinomial mixture model-based approach for short text clustering , 2014, KDD.

[20]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[21]  Scott Sanner,et al.  Improving LDA topic models for microblogs via tweet pooling and automatic labeling , 2013, SIGIR.

[22]  Achim Zeileis,et al.  Strucchange: An R package for testing for structural change in linear regression models , 2002 .

[23]  Vijayalakshmi Atluri,et al.  Analysis of political discourse on twitter in the context of the 2016 US presidential elections , 2017, Gov. Inf. Q..

[24]  Hui Zhang,et al.  Experimental explorations on short text topic mining between LDA and NMF based Schemes , 2019, Knowl. Based Syst..