When big data meets big smog: a big spatio-temporal data framework for China severe smog analysis

Recently, the appearing disaster of severe smog has been attacking many cities in China such as the capital Beijing. The chief culprit of China smog, namely PM2.5, is affected by various factors including air pollutants, weather, climate, geographical location, urbanization, etc. To analyze the factors, we collect about 35,000,000 air quality records and about 30,000,000 weather records from the sensors in 77 China's cities in 2013. Moreover, two big data sets named Geoname and DBPedia are also combined for the data of climate, geographical location and urbanization. To deal with big spatio-temporal data for big smog analysis, we propose a MapReduce-based framework named BigSmog. It mainly conducts parallel correlation analysis of the factors and scalable training of artificial neural networks for spatio-temporal approximation of the concentration of PM2.5. In the experiments, BigSmog displays high scalability for big smog analysis with big spatio-temporal data. The analysis result shows that the air pollutants influence the short-term concentration of PM2.5 more than the weather and the factors of geographical location and climate rather than urbanization play a major role in determining a city's long-term pollution level of PM2.5. Moreover, the trained ANNs can accurately approximate the concentration of PM2.5.

[1]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[2]  Lothar Thiele,et al.  Participatory Air Pollution Monitoring Using Smartphones , 2012 .

[3]  Yang Li,et al.  Haze trends over the capital cities of 31 provinces in China, 1981–2005 , 2009 .

[4]  Ying Wang,et al.  Chemical characteristics of PM2.5 and PM10 in haze-fog episodes in Beijing. , 2006, Environmental science & technology.

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Yu Zheng,et al.  U-Air: when urban air quality inference meets big data , 2013, KDD.

[7]  Wang Xiaoxuan,et al.  Remote sensing monitoring on photochemical pollution caused by haze in Pearl River Delta , 2009, 2009 Joint Urban Remote Sensing Event.

[8]  J. Hooyberghs,et al.  A neural network forecast for daily average PM10 concentrations in Belgium , 2005 .

[9]  K. Pericleous,et al.  Modelling air quality in street canyons : a review , 2003 .

[10]  R. Martin,et al.  Estimating ground-level PM 2.5 using aerosol optical depth determined from satellite remot , 2006 .

[11]  Mohd Talib Latif,et al.  Spatial Assessment of Air Quality Patterns in Malaysia Using Multivariate Analysis , 2012 .

[12]  K. He,et al.  Chemical characteristics of haze during summer and winter in Guangzhou , 2009 .

[13]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[14]  Hui Chen,et al.  A haze monitoring over North China Plain , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[15]  B. Holben,et al.  Global monitoring of air pollution over land from the Earth Observing System-Terra Moderate Resolution Imaging Spectroradiometer (MODIS) , 2003 .

[16]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[17]  Azman Azid,et al.  Spatial and temporal air quality pattern recognition using environmetric techniques: a case study in Malaysia. , 2013, Environmental science. Processes & impacts.

[18]  Takashi Amagai,et al.  The 1997 Haze Disaster in Indonesia: Its Air Quality and Health Effects , 2002, Archives of environmental health.

[19]  Ian G. McKendry,et al.  Evaluation of Artificial Neural Networks for Fine Particulate Pollution (PM10 and PM2.5) Forecasting , 2002, Journal of the Air & Waste Management Association.

[20]  Dong Han,et al.  Design and application of Haze Optic Thickness retrieval model for Beijing Olympic Games , 2009, 2009 IEEE International Geoscience and Remote Sensing Symposium.

[21]  Georgios Grivas,et al.  Artificial neural network models for prediction of PM10 hourly concentrations, in the Greater Area of Athens, Greece , 2006 .

[22]  Yang Liu,et al.  Estimating ground-level PM2.5 in China using satellite remote sensing. , 2014, Environmental science & technology.