Machine Learning Approaches for Outdoor Air Quality Modelling: A Systematic Review

Current studies show that traditional deterministic models tend to struggle to capture the non-linear relationship between the concentration of air pollutants and their sources of emission and dispersion. To tackle such a limitation, the most promising approach is to use statistical models based on machine learning techniques. Nevertheless, it is puzzling why a certain algorithm is chosen over another for a given task. This systematic review intends to clarify this question by providing the reader with a comprehensive description of the principles underlying these algorithms and how they are applied to enhance prediction accuracy. A rigorous search that conforms to the PRISMA guideline is performed and results in the selection of the 46 most relevant journal papers in the area. Through a factorial analysis method these studies are synthetized and linked to each other. The main findings of this literature review show that: (i) machine learning is mainly applied in Eurasian and North American continents and (ii) estimation problems tend to implement Ensemble Learning and Regressions, whereas forecasting make use of Neural Networks and Support Vector Machines. The next challenges of this approach are to improve the prediction of pollution peaks and contaminants recently put in the spotlights (e.g., nanoparticles).

[1]  Jiaoyan Chen,et al.  Forecasting smog-related health hazard based on social media and physical sensor , 2016, Information Systems.

[2]  Lazaros S. Iliadis,et al.  Neurocomputing techniques to dynamically forecast spatiotemporal air pollution data , 2013, Evol. Syst..

[3]  Hong Huang,et al.  Relevance analysis and short-term prediction of PM2.5 concentrations in Beijing based on multi-source data , 2017 .

[4]  Jimy Dudhia,et al.  On the Ability of the WRF Model to Reproduce the Surface Wind Direction over Complex Terrain , 2013 .

[5]  Michael Jerrett,et al.  Spatiotemporal Modeling of Ozone Levels in Quebec (Canada): A Comparison of Kriging, Land-Use Regression (LUR), and Combined Bayesian Maximum Entropy–LUR Approaches , 2014, Environmental health perspectives.

[6]  P. J. García Nieto,et al.  Application of an SVM-based regression model to the air quality study at local scale in the Avilés urban area (Spain) , 2011, Math. Comput. Model..

[7]  M. Oprea,et al.  Particulate Matter Air Pollutants Forecasting using Inductive Learning Approach , 2016 .

[8]  P. Kinney,et al.  Climate change, air quality, and human health. , 2008, American journal of preventive medicine.

[9]  Abderrahmane Sadiq,et al.  Towards an agent based traffic regulation and recommendation system for the on-road air quality control , 2016, SpringerPlus.

[10]  Lianfa Li,et al.  Constrained Mixed-Effect Models with Ensemble Learning for Prediction of Nitrogen Oxides Concentrations at High Spatiotemporal Resolution. , 2017, Environmental science & technology.

[11]  D. Makowski,et al.  Prediction of N2O emission from local information with Random Forest. , 2013, Environmental pollution.

[12]  Vijay Sivaraman,et al.  HazeEst: Machine Learning Based Metropolitan Air Pollution Estimation From Fixed and Mobile Sensors , 2017, IEEE Sensors Journal.

[13]  José M. Cecilia,et al.  Air-Pollution Prediction in Smart Cities through Machine Learning Methods: A Case of Study in Murcia, Spain , 2018, J. Univers. Comput. Sci..

[14]  Prashant Kumar,et al.  Prediction of airborne nanoparticles at roadside location using a feed–forward artificial neural network , 2017 .

[15]  Bijan Yeganeh,et al.  Prediction of CO concentrations based on a hybrid Partial Least Square and Support Vector Machine model , 2012 .

[16]  Itai Kloog,et al.  Modelling daily PM2.5 concentrations at high spatio-temporal resolution across Switzerland. , 2018, Environmental pollution.

[17]  Michael Brauer,et al.  Within-urban variability in ambient air pollution: Comparison of estimation methods , 2008 .

[18]  Adrian Doicu,et al.  A Novel Ozone Profile Shape Retrieval Using Full-Physics Inverse Learning Machine (FP-ILM) , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[19]  Konstantinos Demertzis,et al.  HISYCOL a hybrid computational intelligence system for combined machine learning: the case of air pollution modeling in Athens , 2015, Neural Computing and Applications.

[20]  Alexander Baklanov Application of CFD Methods for Modelling in Air Pollution Problems: Possibilities and Gaps , 2000 .

[21]  M. Shima,et al.  Spatiotemporal land use random forest model for estimating metropolitan NO2 exposure in Japan. , 2018, The Science of the total environment.

[22]  W. Stockwell,et al.  An online coupled meteorological and air quality modeling study of the effect of complex terrain on the regional transport and transformation of air pollutants over the Western United States , 2007 .

[23]  Pericles A. Mitkas,et al.  Sparse episode identification in environmental datasets: The case of air quality assessment , 2011, Expert Syst. Appl..

[24]  Sylvain Piechowiak,et al.  Comparative study of supervised classification algorithms for the detection of atmospheric pollution , 2011, Eng. Appl. Artif. Intell..

[25]  Baofeng Di,et al.  Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment. , 2018, Environmental pollution.

[26]  A. Kondo,et al.  Evaluation of Air Quality Model Performance for Simulating Long-Range Transport and Local Pollution of PM2.5 in Japan , 2016 .

[27]  Guofeng Cao,et al.  Improve ground-level PM2.5 concentration mapping using a random forests-based geostatistical approach. , 2018, Environmental Pollution.

[28]  Matthew Limb,et al.  Half of wealthy and 98% of poorer cities breach air quality guidelines , 2016, British Medical Journal.

[29]  R. Sausen,et al.  Evaluation of the performance of four chemical transport models in predicting the aerosol chemical composition in Europe in 2005 , 2016 .

[30]  Yang Zhang,et al.  Real-time air quality forecasting, part I: History, techniques, and current status , 2012 .

[31]  Christian Schindler,et al.  Short-term association between ambient air pollution and pneumonia in children: A systematic review and meta-analysis of time-series and case-crossover studies. , 2017, Environmental pollution.

[32]  Arwa S. Sayegh,et al.  Comparing the Performance of Statistical Models for Predicting PM10 Concentrations , 2014 .

[33]  J. Gulliver,et al.  A review of land-use regression models to assess spatial variation of outdoor air pollution , 2008 .

[34]  Ke Liang,et al.  Investigating China’s urban air quality using big data, information theory, and machine learning , 2017 .

[35]  Bin Zhang,et al.  Predicting submicron air pollution indicators: a machine learning approach. , 2013, Environmental science. Processes & impacts.

[36]  J. A. Vilán,et al.  Air quality modeling in the Oviedo urban area (NW Spain) by using multivariate adaptive regression splines , 2015, Environmental Science and Pollution Research.

[37]  G. Grigoras,et al.  AIR POLLUTION DISPERSION MODELING IN A POLLUTED INDUSTRIAL AREA OF COMPLEX TERRAIN FROM ROMANIA , 2012 .

[38]  Vivien Mallet,et al.  Ensemble forecasting with machine learning algorithms for ozone, nitrogen dioxide and PM10 on the Prev'Air platform , 2014 .

[39]  Gilles Notton,et al.  Hybridization of Air Quality Forecasting Models Using Machine Learning and Clustering: An Original Approach to Detect Pollutant Peaks , 2016 .

[40]  David C. Carslaw,et al.  Random forest meteorological normalisation models for Swiss PM 10 trend analysis , 2018 .

[41]  Manoj Kumar Tiwari,et al.  Urban air quality forecasting based on multi-dimensional collaborative Support Vector Regression (SVR): A case study of Beijing-Tianjin-Shijiazhuang , 2017, PloS one.

[42]  Abdullah Kadri,et al.  Urban Air Pollution Monitoring System With Forecasting Models , 2016, IEEE Sensors Journal.

[43]  Matthew L. Thomas,et al.  Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the Global Burden of Diseases Study 2015 , 2017, The Lancet.

[44]  Aranildo R. Lima,et al.  Evaluating hourly air quality forecasting in Canada with nonlinear updatable machine learning methods , 2017, Air Quality, Atmosphere & Health.

[45]  Pei-Chann Chang,et al.  A Deep Recurrent Neural Network for Air Quality Classification , 2018, J. Inf. Hiding Multim. Signal Process..

[46]  David E Newby,et al.  Inhaled Nanoparticles Accumulate at Sites of Vascular Disease , 2017, ACS nano.

[47]  Feng Xu,et al.  Prediction of hourly PM 2.5 using a space-time support vector regression model , 2018 .

[48]  Jianzhou Wang,et al.  Short-term effects of air pollution on lower respiratory diseases and forecasting by the group method of data handling , 2012 .

[49]  Areeg Abdalla,et al.  A Novel Approach of Weighted Support Vector Machine with Applied Chance Theory for Forecasting Air Pollution Phenomenon in Egypt , 2018, Int. J. Comput. Intell. Appl..

[50]  Yves Rybarczyk,et al.  Modeling PM2.5 Urban Pollution Using Machine Learning and Selected Meteorological Parameters , 2017, J. Electr. Comput. Eng..

[51]  Jiangshe Zhang,et al.  Prediction of Air Pollutants Concentration Based on an Extreme Learning Machine: The Case of Hong Kong , 2017, International journal of environmental research and public health.

[52]  Qi Li,et al.  Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation , 2015 .

[53]  J. Lamarque,et al.  FUTURE GLOBAL MORTALITY FROM CHANGES IN AIR POLLUTION ATTRIBUTABLE TO CLIMATE CHANGE , 2017, Nature climate change.

[54]  P. McDonald,et al.  Comparison between the predictions of a Gaussian plume model and a Lagrangian particle dispersion model for annual average calculations of long-range dispersion of radionuclides. , 2004, Journal of environmental radioactivity.

[55]  Yu Zhan,et al.  Spatiotemporal prediction of continuous daily PM2.5 concentrations across China using a spatially explicit machine learning algorithm , 2017 .

[56]  M. Brauer,et al.  Land use regression modelling of air pollution in high density high rise cities: A case study in Hong Kong. , 2017, The Science of the total environment.

[57]  Claudio Carnevale,et al.  Lazy Learning based surrogate models for air quality planning , 2016, Environ. Model. Softw..

[58]  Shikha Gupta,et al.  Identifying pollution sources and predicting urban air quality using ensemble learning methods , 2013 .

[59]  Olivier Grunder,et al.  A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine. , 2017, The Science of the total environment.

[60]  Ming-Yi Tsai,et al.  Air pollution modeling over very complex terrain: An evaluation of WRF-Chem over Switzerland for two 1-year periods , 2013 .

[61]  G. Lemasters,et al.  Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches. , 2017, Atmospheric environment.

[62]  S. Sahu,et al.  Quantifying the impact of current and future concentrations of air pollutants on respiratory disease risk in England , 2017, Environmental Health.

[63]  Joel Schwartz,et al.  A spatio‐temporal prediction model based on support vector machine regression: Ambient Black Carbon in three New England States , 2017, Environmental research.

[64]  J. Lelieveld,et al.  The contribution of outdoor air pollution sources to premature mortality on a global scale , 2015, Nature.

[65]  E. Torres,et al.  ARIMA analysis of the effect of land surface coverage on PM10 concentrations in a high-altitude megacity , 2017 .

[66]  Zhijie Zhu,et al.  Research and application of a novel hybrid air quality early-warning system: A case study in China. , 2018, The Science of the total environment.

[67]  Ping-Huan Kuo,et al.  A Deep CNN-LSTM Model for Particulate Matter (PM2.5) Forecasting in Smart Cities , 2018, Sensors.

[68]  Zev Ross,et al.  Application of the deletion/substitution/addition algorithm to selecting land use regression models for interpolating air pollution measurements in California , 2013 .

[69]  Michael Dorman,et al.  Correcting Measurement Error in Satellite Aerosol Optical Depth with Machine Learning for Modeling PM2.5 in the Northeastern USA , 2018, Remote. Sens..

[70]  Yves Rybarczyk,et al.  Quantifying decade-long effects of fuel and traffic regulations on urban ambient PM2.5 pollution in a mid-size South American city , 2018 .

[71]  Weng-Fai Ip,et al.  Short-term prediction of air pollution in macau using support vector machines , 2012 .

[72]  A. Daly,et al.  Air Pollution Modeling - An Overview , 2007 .