Word frequency and sentiment analysis of twitter messages during Coronavirus pandemic

The Coronavirus pandemic has taken the world by storm as also the social media. As the awareness about the ailment increased, so did messages, videos and posts acknowledging its presence. The social networking site, Twitter, demonstrated similar effect with the number of posts related to coronavirus showing an unprecedented growth in a very short span of time. This paper presents a statistical analysis of the twitter messages related to this disease posted since January 2020. Two types of empirical studies have been performed. The first is on word frequency and the second on sentiments of the individual tweet messages. Inspection of the word frequency is useful in characterizing the patterns or trends in the words used on the site. This would also reflect on the psychology of the twitter users at this critical juncture. Unigram, bigram and trigram frequencies have been modeled by power law distribution. The results have been validated by Sum of Square Error (SSE), R2 and Root Mean Square Error (RMSE). High values of R2 and low values of SSE and RMSE lay the grounds for the goodness of fit of this model. Sentiment analysis has been conducted to understand the general attitudes of the twitter users at this time. Both tweets by general public and WHO were part of the corpus. The results showed that the majority of the tweets had a positive polarity and only about 15% were negative.

[1]  Youngki Lee,et al.  Power-law distribution of family names in Japanese societies , 1999, cond-mat/9912035.

[2]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[3]  Chien Chin Chen,et al.  Quality evaluation of product reviews using an information quality framework , 2011, Decis. Support Syst..

[4]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[5]  João Francisco Valiati,et al.  Document-level sentiment classification: An empirical comparison between SVM and ANN , 2013, Expert Syst. Appl..

[6]  Daekook Kang,et al.  Measuring customer satisfaction of service based on an analysis of the user generated contents: Sentiment analysis and aggregating function based MCDM approach , 2012, 2012 IEEE International Conference on Management of Innovation & Technology (ICMIT).

[7]  B. D. Jayaram,et al.  Zipf's Law for Indian Languages , 2008, Journal of Quantitative Linguistics.

[8]  Gerardo Chowell,et al.  A Twitter Dataset of 100+ million tweets related to COVID-19 , 2020 .

[9]  Kiyoaki Shirai,et al.  Topic Modeling based Sentiment Analysis on Social Media for Stock Market Prediction , 2015, ACL.

[10]  Geoffrey Sampson,et al.  Word frequency distributions , 2002, Computational Linguistics.

[11]  Robert L. Solso,et al.  Bigram and trigram frequencies and versatilities in the English language , 1979 .

[12]  Pabitra Mitra,et al.  Link Prediction in Social Networks: Role of Power Law Distribution , 2016 .

[13]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[14]  Andrew B. Whinston,et al.  Whose and what chatter matters? The effect of tweets on movie sales , 2013, Decis. Support Syst..

[15]  S. Shtrikman Some comments on Zipf's law for the Chinese language , 1994, J. Inf. Sci..

[16]  G. Āllport The Psycho-Biology of Language. , 1936 .

[17]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[18]  David Zimbra,et al.  Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network , 2013, Expert Syst. Appl..

[19]  Peter G. Harrison,et al.  Measurement and modelling of self-similar traffic in computer networks , 2004 .

[20]  Francesc Font-Clos,et al.  Large-Scale Analysis of Zipf’s Law in English Texts , 2015, PloS one.

[21]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[22]  Bhavya Ahuja,et al.  A statistical probe into the word frequency and length distributions prevalent in the translations of Bhagavad Gita , 2019, Pramana.

[23]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[24]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[25]  Jakob Grue Simonsen,et al.  Power Law Distributions in Information Retrieval , 2016, ACM Trans. Inf. Syst..

[26]  R. Harald Baayen,et al.  Statistical models for word frequency distributions: A linguistic evaluation , 1992, Comput. Humanit..

[27]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .