Using Social Media to Detect Outdoor Air Pollution and Monitor Air Quality Index (AQI): A Geo-Targeted Spatiotemporal Analysis Framework with Sina Weibo (Chinese Twitter)

Outdoor air pollution is a serious problem in many developing countries today. This study focuses on monitoring the dynamic changes of air quality effectively in large cities by analyzing the spatiotemporal trends in geo-targeted social media messages with comprehensive big data filtering procedures. We introduce a new social media analytic framework to (1) investigate the relationship between air pollution topics posted in Sina Weibo (Chinese Twitter) and the daily Air Quality Index (AQI) published by China’s Ministry of Environmental Protection; and (2) monitor the dynamics of air quality index by using social media messages. Correlation analysis was used to compare the connections between discussion trends in social media messages and the temporal changes in the AQI during 2012. We categorized relevant messages into three types, retweets, mobile app messages, and original individual messages finding that original individual messages had the highest correlation to the Air Quality Index. Based on this correlation analysis, individual messages were used to monitor the AQI in 2013. Our study indicates that the filtered social media messages are strongly correlated to the AQI and can be used to monitor the air quality dynamics to some extent.

[1]  Ed H. Chi,et al.  Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network , 2010, 2010 IEEE Second International Conference on Social Computing.

[2]  Ram M. Shrestha,et al.  Air pollution from energy use in a developing country city: The case of Kathmandu Valley, Nepal☆ , 1996 .

[3]  Danah Boyd,et al.  Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[4]  Takao Kobayashi,et al.  Phone duration modeling using gradient tree boosting , 2008, Speech Commun..

[5]  Ming-Hsiang Tsou,et al.  Visualization of social media: seeing a mirage or a message? , 2013 .

[6]  Andreas M. Kaplan,et al.  The early bird catches the news: Nine things you should know about micro-blogging , 2011 .

[7]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[8]  Han Li,et al.  Inferring air pollution by sniffing social media , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[9]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[10]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[11]  Marcel Salathé,et al.  Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control , 2011, PLoS Comput. Biol..

[12]  Bernardo A. Huberman,et al.  What Trends in Chinese Social Media , 2011, ArXiv.

[13]  E. El-Dahshan,et al.  Total cross section prediction of the collisions of positrons and electrons with alkali atoms using Gradient Tree Boosting , 2011 .

[14]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[15]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[16]  S. R. Subramanya,et al.  Point-of-View Article on: Design of a Smartphone App for Learning Concepts in Mathematics and Engineering , 2012 .

[17]  David M Stieb,et al.  Meta-Analysis of Time-Series Studies of Air Pollution and Mortality: Effects of Gases and Particles and the Influence of Cause of Death, Age, and Season , 2002, Journal of the Air & Waste Management Association.

[18]  Brian H. Spitzberg,et al.  The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance , 2014, Journal of medical Internet research.

[19]  Brian H. Spitzberg,et al.  Mapping social activities and concepts with social media (Twitter) and web search engines (Yahoo and Bing): a case study in 2012 US Presidential Election , 2013 .

[20]  Thomas G. Dietterich,et al.  Training conditional random fields via gradient tree boosting , 2004, ICML.

[21]  Dave Yates,et al.  Emergency knowledge management and social media technologies: A case study of the 2010 Haitian earthquake , 2010, ASIST.

[22]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[23]  Shaoyong Chen,et al.  Comparision of microblogging service between Sina Weibo and Twitter , 2011, Proceedings of 2011 International Conference on Computer Science and Network Technology.

[24]  Michael J. Paul,et al.  Social Media as a Sensor of Air Quality and Public Response in China , 2015, Journal of medical Internet research.

[25]  Dave Yates,et al.  Emergency knowledge management and social media technologies: A case study of the 2010 Haitian earthquake , 2011, Int. J. Inf. Manag..

[26]  Xinyue Ye,et al.  Urbanization, urban land expansion and environmental change in China , 2014, Stochastic Environmental Research and Risk Assessment.

[27]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[28]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[29]  M. Green Air pollution and health , 1995 .

[30]  Kazuyuki Aihara,et al.  Quantifying Collective Attention from Tweet Stream , 2013, PloS one.

[31]  Brian H. Spitzberg,et al.  The Complex Relationship of Realspace Events and Messages in Cyberspace: Case Study of Influenza and Pertussis Using Tweets , 2013, Journal of medical Internet research.

[32]  F. Murray,et al.  Assessing Health Effects of Air Pollution in Developing Countries , 2001 .

[33]  X. Ye,et al.  Spatial heterogeneity of economic development and industrial pollution in urban China , 2013, Stochastic Environmental Research and Risk Assessment.