A random forest partition model for predicting NO2 concentrations from traffic flow and meteorological conditions.

High concentrations of nitrogen dioxide in the air, particularly in heavily urbanised areas, have an adverse effect on many aspects of residents' health (short-term and long-term damage, unpleasant odour and other). A method is proposed for modelling atmospheric NO2 concentrations in a conurbation, using a partition model M consisting of two separate models: ML for lower concentration values and MU for upper values. An advanced data mining technique, that of random forests, is used. This is a method based on machine learning, involving the simultaneous compilation of information from multiple random trees. Using the example of data recorded in Wrocław (Poland) in 2015-2017, an iterative method was applied to determine the boundary concentration y˜ for which the mean absolute deviation error for the partition model attained its lowest value. The resulting model had an R2 value of 0.82, compared with 0.60 for a classical random forest model. The importances of the variables in the model ML, similarly as in the classical case, indicate that the greatest influence on NO2 concentrations comes from traffic flow, followed by meteorological factors, in particular the wind direction and speed. In the model MU the importances of the variables are significantly different: while traffic flow still has the greatest impact, the effects of temperature and relative humidity are almost as great. This confirms the justifiability of constructing separate models for low and high pollution concentrations.

[1]  Fei Liu,et al.  NO x lifetimes and emissions of cities and power plants in polluted background estimated by satellite observations , 2016 .

[2]  J. Seinfeld,et al.  Atmospheric Chemistry and Physics: From Air Pollution to Climate Change , 1997 .

[3]  Michael Gager,et al.  Convention on Long-Range Transboundary Air Pollution (LRTAP) , 2018, Yearbook of International Cooperation on Environment and Development 1998–99.

[4]  M. Brauer,et al.  Global Land Use Regression Model for Nitrogen Dioxide Air Pollution. , 2017, Environmental science & technology.

[5]  L. Knibbs,et al.  Traffic related air pollution and development and persistence of asthma and low lung function. , 2018, Environment international.

[6]  Ian Mudway,et al.  Investigation into the use of the CUSUM technique in identifying changes in mean air pollution levels following introduction of a traffic management scheme , 2007 .

[7]  B. Ritz,et al.  Traffic-related air pollution increased the risk of Parkinson's disease in Taiwan: A nationwide study. , 2016, Environment international.

[8]  J. Hidalgo,et al.  An hourly PM10 diagnosis model for the Bilbao metropolitan area using a linear regression methodology , 2013, Environmental Science and Pollution Research.

[9]  Arwa S. Sayegh,et al.  Understanding how roadside concentrations of NO x are influenced by the background levels, traffic density, and meteorological conditions using Boosted Regression Trees , 2016 .

[10]  É. Lavigne,et al.  Childhood autism spectrum disorders and exposure to nitrogen dioxide, and particulate matter air pollution: A review and meta-analysis. , 2016, Environmental research.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  J. Kamińska Residuals in the modelling of pollution concentration depending on meteorological conditions and traffic flow, employing decision trees , 2018 .

[14]  Xin Li,et al.  Mortality and air pollution in Beijing: The long-term relationship , 2017 .

[15]  P. Thai,et al.  Air pollution and risk of respiratory and cardiovascular hospitalizations in the most populous city in Vietnam. , 2016, The Science of the total environment.

[16]  A. Peters,et al.  Long-term air pollution exposure and cardio- respiratory mortality: a review , 2013, Environmental Health.

[17]  Joost van Hoof,et al.  A Location Intelligence System for the Assessment of Pluvial Flooding Risk and the Identification of Storm Water Pollutant Sources from Roads in Suburbanised Areas , 2018, Water.

[18]  J. Kamińska,et al.  The use of random forests in modelling short-term air pollution effects based on traffic and meteorological conditions: A case study in Wrocław. , 2018, Journal of environmental management.

[19]  Kai Zhang,et al.  Air pollution and health risks due to vehicle traffic. , 2013, The Science of the total environment.

[20]  Javier Del Ser,et al.  The role of local urban traffic and meteorological conditions in air pollution: A data-based case study in Madrid, Spain , 2016 .

[21]  Jan Kazak,et al.  Geo-Dynamic Decision Support System for Urban Traffic Management , 2017 .

[22]  Kellie J. Archer,et al.  Empirical characterization of random forest variable importance measures , 2008, Comput. Stat. Data Anal..

[23]  Margaret Bell,et al.  Improving the prediction of air pollution peak episodes generated by urban transport networks , 2016 .

[24]  Rafal Wawer,et al.  Indicator-based assessment for soil resource management in the Wrocław larger urban zone of Poland , 2017 .