Hybrid model of Air Quality Prediction Using K-Means Clustering and Deep Neural Network

With the rapid development of economy and the emission of a lot of polluted gases, air pollution is increasingly serious. Air quality prediction is an effective way to provide early warning of harmful air pollutants, which can protect public health. A hybrid model of air quality prediction which uses K-Means clustering and deep neural network is proposed in this paper. The deep neural network with capacity of regressive computation consists of bidirectional LSTM (Long Short-Term Memory) and fully connected neural network. First of all, the historical meteorological monitoring data of Qingdao City is taken as the research target, and the meteorological data is divided into four categories according to the quarter by k-Means clustering algorithm. Then the classified meteorological data and the data of historical concentration of air pollutants are used to train neural network. A better hyperparameter combination is selected by lots of trial. Next, the hybrid model is applied on the test set, and the mean square error between predicted value and true value is used as the evaluation criterion of predictive property. Last, through comparing with other algorithm models, it is proved that the proposed hybrid model can achieve higher precision for air quality prediction.

[1]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[2]  P. Goyal,et al.  Artificial intelligence based approach to forecast PM2.5 during haze episodes: A case study of Delhi, India , 2015 .

[3]  I D Williams,et al.  The impact of communicating information about air pollution events on public health. , 2015, The Science of the total environment.

[4]  Patricio Perez,et al.  Prediction of NO and NO2 concentrations near a street with heavy traffic in Santiago, Chile , 2001 .

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Julia C. Fussell,et al.  Air pollution and public health: emerging hazards and improved understanding of risk , 2015, Environmental Geochemistry and Health.

[7]  Boqiang Lin,et al.  A dynamic analysis of air pollution emissions in China: Evidence from nonparametric additive regression models , 2016 .

[8]  J. Skrzypski,et al.  Application of Artificial Neural Networks (ANNs) to Predict Air Quality Classes in Big Cities , 2008, 2008 19th International Conference on Systems Engineering.

[9]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[10]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[11]  Marija Zlata Boznar,et al.  A neural network-based method for short-term predictions of ambient SO2 concentrations in highly polluted industrial areas of complex terrain , 1993 .

[12]  Hafizan Juahir,et al.  Prediction of the Level of Air Pollution Using Principal Component Analysis and Artificial Neural Network Techniques: a Case Study in Malaysia , 2014, Water, Air, & Soil Pollution.

[13]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[14]  S. K. Samanta,et al.  On Similarity Measures of Fuzzy Soft Sets , 2011 .

[15]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..