Intelligent Twitter Data Analysis Based on Nonnegative Matrix Factorizations

In this paper we face the problem of intelligently analyze Twitter data. We propose a novel workflow based on Nonnegative Matrix Factorization (NMF) to collect, organize and analyze Twitter data. The proposed workflow firstly fetches tweets from Twitter (according to some search criteria) and processes them using text mining techniques; then it is able to extract latent features from tweets by using NMF, and finally it clusters tweets and extracts human-interpretable topics. We report some preliminary experiments demonstrating the effectiveness of the proposed workflow as a tool for Intelligent Data Analysis (IDA), indeed it is able to extract and visualize interpretable topics from some newly collected Twitter datasets, that are automatically grouped together according to these topics. Furthermore, we numerically investigate the influence of different initializations mechanisms for NMF algorithms on the factorization results when very sparse Twitter data are considered. The numerical comparisons confirm that NMF algorithms can be used as clustering method in place of the well known k-means.

[1]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[2]  Vikas Sindhwani,et al.  Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization , 2012, WSDM '12.

[3]  Yourim Yoon,et al.  Two applications of clustering techniques to Twitter: Community detection and issue extraction , 2013 .

[4]  Mung Chiang,et al.  Quantifying Political Leaning from Tweets, Retweets, and Retweeters , 2016, IEEE Transactions on Knowledge and Data Engineering.

[5]  Michael W. Berry,et al.  Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[6]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[7]  Amy Nicole Langville,et al.  Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization , 2014, ArXiv.

[8]  José M. Alonso,et al.  Interpretability of Fuzzy Systems: Current Research Trends and Prospects , 2015, Handbook of Computational Intelligence.

[9]  Corrado Mencar,et al.  Subtractive clustering for seeding non-negative matrix factorizations , 2014, Inf. Sci..

[10]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[11]  Dietrich Lehmann,et al.  Nonsmooth nonnegative matrix factorization (nsNMF) , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Corrado Mencar,et al.  Nonnegative Matrix Factorizations for Intelligent Data Analysis , 2016 .

[13]  Nicoletta Del Buono,et al.  Breast Cancer's Microarray Data: Pattern Discovery Using Nonnegative Matrix Factorizations , 2016, MOD.

[14]  Anupam Joshi,et al.  Identifying and characterizing user communities on Twitter during crisis events , 2012, DUBMMSM '12.

[15]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[16]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[17]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[18]  Agus Zainal Arifin,et al.  Emotion Detection of Tweets in Indonesian Language using Non-Negative Matrix Factorization , 2014 .

[19]  Athanasios V. Vasilakos,et al.  Understanding user behavior in online social networks: a survey , 2013, IEEE Communications Magazine.

[20]  Charu C. Aggarwal,et al.  Mining Text Data , 2012 .

[21]  Xiaohui Yan,et al.  Learning Topics in Short Texts by Non-negative Matrix Factorization on Term Correlation Matrix , 2013, SDM.

[22]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[23]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[24]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[25]  Christos Boutsidis,et al.  SVD based initialization: A head start for nonnegative matrix factorization , 2008, Pattern Recognit..

[26]  Carl Dean Meyer,et al.  A Case Study in Text Mining: Interpreting Twitter Data From World Cup Tweets , 2014, ArXiv.

[27]  Massimo Minervini,et al.  Nonnegative Matrix Factorizations Performing Object Detection and Localization , 2012, Appl. Comput. Intell. Soft Comput..

[28]  Nicolas Gillis,et al.  The Why and How of Nonnegative Matrix Factorization , 2014, ArXiv.