On Nonnegative Matrix and Tensor Decompositions for COVID-19 Twitter Dynamics

We analyze Twitter data relating to the COVID-19 pandemic using dynamic topic modeling techniques to learn topics and their prevalence over time. Topics are learned using four methods: nonnegative matrix factorization (NMF), nonnegative CP tensor decomposition (NCPD), online NMF, and online NCPD. All of the methods considered discover major topics that persist for multiple weeks relating to China, social distancing, and U.S. President Trump. The topics about China dominate in early February before giving way to more diverse topics. We observe that NCPD and online NCPD can detect topics that are prevalent over a few days, such as the outbreak in South Korea. The topics detected by NMF and online NMF, however, are prevalent over longer periods of time. Our results are validated against external news sources.

[1]  Shuiqiao Yang,et al.  Detecting Topic and Sentiment Dynamics Due to COVID-19 Pandemic Using Social Media , 2020, ADMA.

[2]  Catherine Ordun,et al.  Exploratory Analysis of Covid-19 Tweets using Topic Modeling, UMAP, and DiGraphs , 2020, ArXiv.

[3]  Deanna Needell,et al.  Online matrix factorization for Markovian data and applications to Network Dictionary Learning , 2019, ArXiv.

[4]  Maja Pantic,et al.  TensorLy: Tensor Learning in Python , 2016, J. Mach. Learn. Res..

[5]  Naren Ramakrishnan,et al.  Flu Gone Viral: Syndromic Surveillance of Flu on Twitter Using Temporal Topic Models , 2014, 2014 IEEE International Conference on Data Mining.

[6]  Yasushi Sakurai,et al.  Online multiscale dynamic topic models , 2010, KDD.

[7]  Zhigang Luo,et al.  Online Nonnegative Matrix Factorization With Robust Stochastic Approximation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Jaegul Choo,et al.  Nonnegative Matrix Factorization for Interactive Topic Modeling and Document Clustering , 2014 .

[9]  Sandeep Soni,et al.  Racism is a Virus: Anti-Asian Hate and Counterhate in Social Media during the COVID-19 Crisis , 2020, ArXiv.

[10]  Alain Rakotomamonjy,et al.  Non-Negative Tensor Dictionary Learning , 2018, ESANN.

[11]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[12]  Maximilian Mozes,et al.  Measuring Emotions in the COVID-19 Real World Worry Dataset , 2020, NLPCOVID19.

[13]  Ioan Buciu,et al.  Non-negative Matrix Factorization, A New Tool for Feature Extraction: Theory and Applications , 2008 .

[14]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15]  Bennett Kleinberg,et al.  Women worry about family, men about the economy: Gender differences in emotional responses to COVID-19 , 2020, SocInfo.

[16]  Jing Li,et al.  Modeling Relational Drug-Target-Disease Interactions via Tensor Factorization with Multiple Web Sources , 2019, WWW.

[17]  Julien Mairal,et al.  Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization , 2013, NIPS.

[18]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[19]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[20]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[21]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[22]  Emilio Ferrara,et al.  What types of COVID-19 conspiracies are populated by Twitter bots? , 2020, First Monday.

[23]  Alexander J. Smola,et al.  Fast and Guaranteed Tensor Decomposition via Sketching , 2015, NIPS.

[24]  Jamie Haddock,et al.  On Large-Scale Dynamic Topic Modeling with Nonnegative CP Tensor Decomposition , 2020, Association for Women in Mathematics Series.

[25]  Vikas Sindhwani,et al.  Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization , 2012, WSDM '12.

[26]  Skyler J. Cranmer,et al.  Elusive consensus: Polarization in elite communication on the COVID-19 pandemic , 2020, Science Advances.

[27]  Evangelos E. Papalexakis,et al.  A Constrained Coupled Matrix-Tensor Factorization for Learning Time-evolving and Emerging Topics , 2018, ArXiv.

[28]  Vincent Yan Fu Tan,et al.  Online Nonnegative Matrix Factorization with General Divergences , 2016, AISTATS.

[29]  Deanna Needell,et al.  Online nonnegative tensor factorization and CP-dictionary learning for Markovian data , 2020, ArXiv.

[30]  Hajime Kamiya,et al.  Initial Investigation of Transmission of COVID-19 Among Crew Members During Quarantine of a Cruise Ship — Yokohama, Japan, February 2020 , 2020, MMWR. Morbidity and mortality weekly report.

[31]  Mihhail Matskin,et al.  On Dynamic Topic Models for Mining Social Media , 2018, Lecture Notes in Social Networks.

[32]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[33]  Filippo Menczer,et al.  Prevalence of Low-Credibility Information on Twitter During the COVID-19 Outbreak , 2020, ICWSM Workshops.

[34]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[35]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[36]  Bin Li,et al.  Modeling the evolution of development topics using Dynamic Topic Models , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[37]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[38]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[39]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[40]  Svenja Boberg,et al.  Pandemic Populism: Facebook Pages of Alternative News Media and the Corona Crisis - A Computational Content Analysis , 2020, ArXiv.

[41]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[42]  Michael J. Paul,et al.  Discovering Health Topics in Social Media Using Topic Models , 2014, PloS one.

[43]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[44]  Kristina Lerman,et al.  Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set , 2020, JMIR public health and surveillance.

[45]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.