Characterization of citizens using word2vec and latent topic analysis in a large set of tweets

With the increasing use of the Internet and mobile devices, social networks are becoming the most used media to communicate citizens’ ideas and thoughts. This information is very useful to identify communities with common ideas based on what they publish in the network. This paper presents a method to automatically detect city communities based on machine learning techniques applied to a set of tweets from Bogotá’s citizens. An analysis was performed in a collection of 2,634,176 tweets gathered from Twitter in a period of six months. Results show that the proposed method is an interesting tool to characterize a city population based on a machine learning methods and text analytics.

[1]  Marek R. Ogiela,et al.  Clustering of trending topics in microblogging posts: A graph-based approach , 2017, Future Gener. Comput. Syst..

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[4]  Xin Wang,et al.  DeepScan: Exploiting Deep Learning for Malicious Account Detection in Location-Based Social Networks , 2018, IEEE Communications Magazine.

[5]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[7]  Alejandro León,et al.  Behavior of Symptoms on Twitter , 2015, SIMBig.

[8]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[11]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[12]  Gui Xiaolin,et al.  Deep Convolution Neural Networks for Twitter Sentiment Analysis , 2018, IEEE Access.

[13]  Fábio M. F. Lobato,et al.  A methodology for community detection in Twitter , 2017, WI.

[14]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[15]  Zhao Yang,et al.  A Comparative Analysis of Community Detection Algorithms on Artificial Networks , 2016, Scientific Reports.

[16]  Fazel Ansari,et al.  Using Word Association to Detect Multitopic Structures in Text Documents , 2014, IEEE Intelligent Systems.

[17]  Fernando Enríquez,et al.  An approach to the use of word embeddings in an opinion classification task , 2016, Expert Syst. Appl..

[18]  Jean-Charles Delvenne,et al.  Different approaches to community detection , 2017, Advances in Network Clustering and Blockmodeling.

[19]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[20]  Jorge E. Camargo,et al.  Ideological Consumerism in Colombian Elections, 2015: Links Between Political Ideology, Twitter Activity, and Electoral Results , 2016, Cyberpsychology Behav. Soc. Netw..

[21]  Paolo Gastaldo,et al.  Bayesian network based extreme learning machine for subjectivity detection , 2017, J. Frankl. Inst..

[22]  Noam Shazeer,et al.  Swivel: Improving Embeddings by Noticing What's Missing , 2016, ArXiv.

[23]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[24]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[25]  Kevin Chen-Chuan Chang,et al.  Learning Community Embedding with Community Detection and Node Embedding on Graphs , 2017, CIKM.

[26]  Klaus Moessner,et al.  Sciencedirect International Workshop on Data Mining on Iot Systems (damis 2016) Real World City Event Extraction from Twitter Data Streams , 2022 .

[27]  Miguel A. Alonso,et al.  Supervised polarity classification of Spanish tweets based on linguistic knowledge , 2013, ACM Symposium on Document Engineering.

[28]  Mintu Philip,et al.  Keyword Based Tweet Extraction and Detection of Related Topics , 2015 .

[29]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[30]  Hadi Larijani,et al.  Exploiting Deep Learning for Persian Sentiment Analysis , 2018, BICS.

[31]  Antonio Fernandez Anta,et al.  Sentiment Analysis and Topic Detection of Spanish Tweets: A Comparative Study of NLP Techniques (Análisis de sentimientos y detección de asunto de tweets en español: un estudio comparativo de técnicas de PLN) , 2013 .

[32]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[33]  R. Wille Concept lattices and conceptual knowledge systems , 1992 .