Uma Estratégia não Supervisionada para Previsão de Eventos usando Redes Sociais

Since the popularization of text-based social media, many studies have been conducted using data from these platforms to predict real-world events. The common approach is to consider people posting to a social media as sensors and their respective messages about an event as an indicator of its occurrence or intensity. In general, a sentiment analysis step is performed in order to assign messages to a set of predefined categories. In this case, manually labeled data is required to train a classifier. However, manually labeling messages can be costly, time consuming and is subject to human error. In addition, sometimes it is difficult to distinguish which predefined category is the most appropriate for an specific message. In this sense, we propose an unsupervised methodology to handle social media data for event prediction. We apply our methodology to dengue-related data from Brazil and show how an unsupervised approach can significantly improve the event prediction performance in the majority of the cases.

[1]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[2]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[3]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.

[4]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[5]  Deepayan Chakrabarti,et al.  Event Summarization Using Tweets , 2011, ICWSM.

[6]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[7]  Sérgio Matos,et al.  Analysing Twitter and web queries for flu trend prediction , 2014, Theoretical Biology and Medical Modelling.

[8]  Nello Cristianini,et al.  Tracking the flu pandemic by monitoring the social web , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[9]  Gisele L. Pappa,et al.  An Evolutionary Methodology for Handling Data Scarcity and Noise in Monitoring Real Events from Social Media Data , 2014, IBERAMIA.

[10]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[11]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..