Investigating the Relationship between Social Media Content and Real-time Observations for Urban Air Quality and Public Health

The rapid rise of Web 2.0 technologies and the popularity of social media, together with the broad use of low cost smart devices, changed dramatically the way users receive information, but also gave them the ability to become significant contributors of disseminated data. The challenge now is to benefit from large volumes of data and collective intelligence, so as to detect what people think or discuss in virtual communities, at the time that an event happens or the information is spread. Our domain of interest is the Urban Air Quality (UAQ) and public health. We want to promote the potential use of social media as a real-time source of "sensing" the environmental load or the existing environmental condition that affects directly humans' quality of life. With the use of the Self-Organizing Map (SOM), we analyze posts gathered from Twitter and we identify existing UAQ conditions, based on users' reports. Clusters of tweets with similar topics of discussion are formed. We additionally investigate the relations between citizens' reports and the corresponding, in time and location, actual observations of specific environmental characteristics. With a thorough investigation of SOM visualizations, we conclude that there is a positive correlation between personal observations and official data, highlighting thus the agreement among soft sensors' (users) and hard sensors' (monitoring sites) measurements.

[1]  Mikko Kolehmainen,et al.  Forecasting Air Quality Parameters Using Hybrid Neural Network Modelling , 2000 .

[2]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[3]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[4]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[5]  J. Thepaut,et al.  The ERA‐Interim reanalysis: configuration and performance of the data assimilation system , 2011 .

[6]  Mita Nasipuri,et al.  A New Approach to Keyphrase Extraction Using Neural Networks , 2010, ArXiv.

[7]  Bernd Resch,et al.  From Social Sensor Data to Collective Human Behaviour Patterns - Analysing and Visualising Spatio-Temporal Dynamics in Urban Environments , 2012 .

[8]  Taketoshi Yoshida,et al.  classification based on multi-word with support vector machineWen , 2008 .

[9]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[10]  Yiannis Kompatsiaris,et al.  Sensing Trending Topics in Twitter , 2013, IEEE Transactions on Multimedia.

[11]  T. Honkela Self-Organizing Maps of Words for Natural Language Processing Applications , 1997 .

[12]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[13]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[14]  Erik Van der Goot,et al.  Combining twitter and media reports on public health events in medisys , 2013, WWW '13 Companion.

[15]  Ah-Hwee Tan,et al.  Text Mining: The state of the art and the challenges , 2000 .

[16]  Johannes Fürnkranz,et al.  A Study Using $n$-gram Features for Text Categorization , 1998 .

[17]  Deborah Estrin,et al.  PEIR, the personal environmental impact report, as a platform for participatory sensing systems research , 2009, MobiSys '09.

[18]  Mikhail Sofiev,et al.  Investigation of relationships and interconnections between Pollen and Air Quality data with the aid of Computational Intelligence Methods , 2009, EnviroInfo.

[19]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[20]  Antonio Neme,et al.  Mining the City Data: Making Sense of Cities with Self-Organizing Maps , 2011, WSOM.

[21]  Erin Robinson Integration Of Multi-Sensory Earth Observations For Characterization Of Air Quality Events , 2010 .

[22]  Mark H. Hansen,et al.  Participatory sensing - eScholarship , 2006 .

[23]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[24]  Xijin Tang,et al.  Text classification based on multi-word with support vector machine , 2008, Knowl. Based Syst..

[25]  Mikko Kolehmainen,et al.  Energy consumption and air quality monitoring system , 2011, 2011 Seventh International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[26]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[27]  John G. Breslin,et al.  Integrating Social Networks and Sensor Networks , 2009 .

[28]  Ville Kotovirta,et al.  Participatory Sensing in Environmental Monitoring -- Experiences , 2012, 2012 Sixth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.

[29]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .