A long short-term memory approach to predicting air quality based on social media data

Abstract Air pollution, such as PM2.5 (particulate matter with an aerodynamic equivalent diameter of less than 2.5 μm), PM10 (particulate matter with an aerodynamic equivalent diameter of less than 10 μm), NOx, and SOx, is a global concern because it may cause many chronic and fatal diseases, especially in developing countries. To better address air pollution problems, an important step is the timely and accurate prediction of air quality. Traditional methods are mainly based on meteorological data, regression model data, remote sensing data and different retrieval methods. Numerous studies on deep learning methods have suggested that these approaches may be able to perform accurate predictions for complex systems. In this paper, a long short-term memory (LSTM) approach for predicting air quality is proposed; moreover, meteorological data are used and Chinese social media is investigated as a proxy for public perceptions and responses for air quality prediction. We gathered daily air quality data, meteorological data and Weibo check-in data for Beijing, China from January 1, 2015 to December 31, 2016. The average sentiment of the related Weibo posts was selected as the public response proxy. The performance of our proposed model is evaluated based on real data. The root-mean-square error (RMSE) and the mean absolute error (MAE) indicated that our method presented better prediction results than traditional methods in terms of the PM2.5, PM10, O3, NO2, SO2 and CO concentrations. We focused on the prediction performance during the 2015 China Victory Day Parade period, during which social and political factors played an important role in air quality predictions. The results indicated that the proposed method, which incorporates public response data, was especially suitable for predicting the air quality in extreme short-term social events and provides a timely social measurement and feedback for environmental problems.

[1]  Athanasios Sfetsos,et al.  A new methodology development for the regulatory forecasting of PM10. Application in the Greater Athens Area, Greece , 2010 .

[2]  Weijia Xu,et al.  Forecasting Urban Air Quality via a Back-Propagation Neural Network and a Selection Sample Rule , 2015, ATMOS 2015.

[3]  Matthew E. Kahn,et al.  Air pollution lowers Chinese urbanites’ expressed happiness on social media , 2019, Nature Human Behaviour.

[4]  Kai Meng Mok,et al.  KALMAN FILTER BASED PREDICTION SYSTEM FOR WINTERTIME PM10 CONCENTRATIONS IN MACAU , 2008 .

[5]  Kamaruzzaman Sopian,et al.  Relationships between airborne particulate matter and meteorological variables using non-decimated wavelet transform , 2008 .

[6]  Suhartono,et al.  Seasonal ARIMA for forecasting air pollution index: a case study , 2012 .

[7]  Michael J. Paul,et al.  Social Media as a Sensor of Air Quality and Public Response in China , 2015, Journal of medical Internet research.

[8]  Sharad Gokhale,et al.  Performance evaluation of air quality models for predicting PM10 and PM2.5 concentrations at urban traffic intersection during winter period. , 2008, The Science of the total environment.

[9]  Bruce Misstear,et al.  Real time air quality forecasting using integrated parametric and non-parametric regression techniques , 2015 .

[10]  Lei Yan,et al.  Analysis of Aerosol Properties in Beijing Based on Ground-Based Sun Photometer and Air Quality Monitoring Observations from 2005 to 2014 , 2016, Remote. Sens..

[11]  M. Greenstone,et al.  Evidence on the impact of sustained exposure to air pollution on life expectancy from China’s Huai River policy , 2013, Proceedings of the National Academy of Sciences.

[12]  Yiannis Kompatsiaris,et al.  hackAIR: Towards Raising Awareness about Air Quality in Europe by Developing a Collective Online Platform , 2018, ISPRS Int. J. Geo Inf..

[13]  D. Dockery,et al.  An association between air pollution and mortality in six U.S. cities. , 1993, The New England journal of medicine.

[14]  Regression and multivariate models for predicting particulate matter concentration level , 2017, Environmental Science and Pollution Research.

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Daiwen Kang,et al.  Application of WRF/Chem-MADRID for real-time air quality forecasting over the Southeastern United States , 2011 .

[17]  Xu Du,et al.  Air quality assessment from social media and structured data: Pollutants and health impacts in urban planning , 2016, 2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW).

[18]  Han Li,et al.  Inferring air pollution by sniffing social media , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[19]  Xiaodong Zhang Social media popularity and election results: A study of the 2016 Taiwanese general election , 2018, PloS one.

[20]  Tao Song,et al.  Analysis of heavy pollution episodes in selected cities of northern China , 2012 .

[21]  G. Ayers Comment on regression analysis of air quality data , 2001 .

[22]  Katsumi Yoshida,et al.  S12 AMBIENT AIR QUALITY STANDARDS , 1988 .

[23]  C. Chan,et al.  Air pollution in mega cities in China , 2008 .

[24]  J. Graff Zivin,et al.  Environment, Health, and Human Capital , 2013 .

[25]  P. Goyal,et al.  Statistical models for the prediction of respirable suspended particulate matter in urban cities , 2006 .

[26]  Basil W. Coutant,et al.  Qualitative and quantitative evaluation of MODIS satellite sensor data for regional and urban scale air quality , 2004 .

[27]  Matthew L. Thomas,et al.  Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the Global Burden of Diseases Study 2015 , 2017, The Lancet.

[28]  A. Lacis,et al.  How well do satellite AOD observations represent the spatial and temporal variability of PM2.5 concentration for the United States , 2015 .

[29]  Dan Wallach,et al.  Inferring Atmospheric Particulate Matter Concentrations from Chinese Social Media Data , 2016, PloS one.

[30]  R. Martin Satellite remote sensing of surface air quality , 2008 .

[31]  Raymond M Hoff,et al.  Recommendations on the Use of Satellite Remote-Sensing Data for Urban Air Quality , 2004, Journal of the Air & Waste Management Association.

[32]  Fatih Taşpınar,et al.  Improving artificial neural network model predictions of daily average PM10 concentrations by applying principle component analysis and implementing seasonal models , 2015, Journal of the Air & Waste Management Association.

[33]  Jean-Claude Thill,et al.  Social Media Discourse in Disaster Situations: A Study of the Deadly July 21, 2012 Beijing Rainstorm , 2017, EM-GIS.

[34]  Simon Luechinger,et al.  Air pollution and infant mortality: a natural experiment from power plant desulfurization. , 2014, Journal of health economics.

[35]  Bing Xue,et al.  Short period PM2.5 prediction based on multivariate linear regression model , 2018, PloS one.

[36]  Rashmi S. Patil,et al.  A GENERAL FINITE LINE SOURCE MODEL FOR VEHICULAR POLLUTION PREDICTION , 1989 .

[37]  Jorge Reyes,et al.  An integrated neural network model for PM10 forecasting , 2006 .

[38]  Lin Zhao,et al.  Assessing disaster impacts and response using social media data in China: A case study of 2016 Wuhan rainstorm , 2019, International Journal of Disaster Risk Reduction.

[39]  Norman M. Sadeh,et al.  The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City , 2012, ICWSM.

[40]  Weidong Zhang,et al.  Prediction of 24-hour-average PM(2.5) concentrations using a hidden Markov model with different emission distributions in Northern California. , 2013, The Science of the total environment.

[41]  Wei Jiang,et al.  Using Social Media to Detect Outdoor Air Pollution and Monitor Air Quality Index (AQI): A Geo-Targeted Spatiotemporal Analysis Framework with Sina Weibo (Chinese Twitter) , 2015, PloS one.

[42]  Y. J. Kaufman,et al.  Satellite measurements of aerosol mass and transport , 1984 .

[43]  Ujjwal Kumar,et al.  ARIMA forecasting of ambient air pollutants (O3, NO, NO2 and CO) , 2010 .

[44]  M. Brauer,et al.  Global Estimates of Ambient Fine Particulate Matter Concentrations from Satellite-Based Aerosol Optical Depth: Development and Application , 2010, Environmental health perspectives.

[45]  F. Esposito,et al.  Aerosol composition and properties variation at the ground and over the column under different air masses advection in South Italy , 2016, Environmental Science and Pollution Research.

[46]  J. Lelieveld,et al.  The contribution of outdoor air pollution sources to premature mortality on a global scale , 2015, Nature.

[47]  Rafiul Hassan,et al.  Urban Air Pollution Forecasting Using Artificial Intelligence-Based Tools , 2010 .

[48]  Teng Wang,et al.  Inferring urban air quality based on social media , 2017, Comput. Environ. Urban Syst..

[49]  Runhe Shi,et al.  Ensemble and enhanced PM10 concentration forecast model based on stepwise regression and wavelet analysis , 2013 .

[50]  C. Ratti,et al.  Exploring the effect of air pollution on social activity in China using geotagged social media check-in data , 2019, Cities.

[51]  J. Chow,et al.  A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: The case of Temuco, Chile , 2008 .

[52]  W. Geoffrey Cobourn,et al.  An enhanced PM2.5 air quality forecast model based on nonlinear regression and back-trajectory concentrations , 2010 .

[53]  Manoj Kumar Tiwari,et al.  Urban air quality forecasting based on multi-dimensional collaborative Support Vector Regression (SVR): A case study of Beijing-Tianjin-Shijiazhuang , 2017, PloS one.

[54]  Yang Liu,et al.  Estimating ground-level PM2.5 in China using satellite remote sensing. , 2014, Environmental science & technology.

[55]  R. Martin,et al.  Fifteen-year global time series of satellite-derived fine particulate matter. , 2014, Environmental science & technology.

[56]  Paul E Benson,et al.  A REVIEW OF THE DEVELOPMENT AND APPLICATION OF THE CALINE3 AND 4 MODELS , 1992 .

[57]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[58]  Rohit Mathur,et al.  An evaluation of real‐time air quality forecasts and their urban emissions over eastern Texas during the summer of 2006 Second Texas Air Quality Study field study , 2009 .