Big-Data-Driven Machine Learning for Enhancing Spatiotemporal Air Pollution Pattern Analysis

Air pollution is an important problem for public health. The spatiotemporal analysis is a crucial step for understanding the complex characteristics of air pollution. Using many sensors and high-resolution time-step observations makes this task a big data challenge. In this study, unsupervised machine learning algorithms were applied to analyze spatiotemporal patterns of air pollution. The analysis was conducted using PM10 big data collected from almost 100 sensors located in Krakow, over a period of one year, with data being recorded at 1-h intervals. The analysis results using K-means and SKATER clustering revealed distinct differences between average and maximum values of pollutant concentrations. The study found that the K-means algorithm with Dynamic Time Warping (DTW) was more accurate in identifying yearly patterns and clustering in rapidly and spatially varying data, compared to the SKATER algorithm. Moreover, the clustering analysis of data after kriging greatly facilitated the interpretation of the results. These findings highlight the potential of machine learning techniques and big data analysis for identifying hot-spots, cold-spots, and patterns of air pollution and informing policy decisions related to urban planning, traffic management, and public health interventions.

[1]  Mateusz Zaręba,et al.  Unsupervised Machine Learning Techniques for Improving Reservoir Interpretation Using Walkaway VSP and Sonic Log Data , 2023, Energies.

[2]  H. B. Abdalla A brief survey on big data: technologies, terminologies and data-intensive applications , 2022, Journal of Big Data.

[3]  Ł. Kuźma,et al.  Exposure to air pollution and its effect on ischemic strokes (EP-PARTICLES study) , 2022, Scientific Reports.

[4]  Mateusz Zaręba,et al.  The influence of meteorological factors and terrain on air pollution concentration and migration: a geostatistical case study from Krakow, Poland , 2022, Scientific reports.

[5]  Andy Hong,et al.  Clustering patterns of urban form factors related to particulate matter concentrations in Seoul, South Korea , 2022, Sustainable Cities and Society.

[6]  Mateusz Zaręba,et al.  Analysis of Air Pollution Migration during COVID-19 Lockdown in Krakow, Poland , 2022, Aerosol and Air Quality Research.

[7]  Lirong Yin,et al.  Spatiotemporal Analysis of Haze in Beijing Based on the Multi-Convolution Model , 2021, Atmosphere.

[8]  Ł. Kuźma,et al.  Short-Term Effects of “Polish Smog” on Cardiovascular Mortality in the Green Lungs of Poland: A Case-Crossover Study with 4,500,000 Person-Years (PL-PARTICLES Study) , 2021, Atmosphere.

[9]  Mateusz Zaręba,et al.  The Use of Public Data from Low-Cost Sensors for the Geospatial Analysis of Air Pollution from Solid Fuel Heating during the COVID-19 Pandemic Spring Period in Krakow, Poland , 2021, Sensors.

[10]  Wanliu Mao,et al.  Non-Linear Response of PM2.5 Pollution to Land Use Change in China , 2021, Remote. Sens..

[11]  H. Jorquera,et al.  Combining Cluster Analysis of Air Pollution and Meteorological Data with Receptor Model Results for Ambient PM2.5 and PM10 , 2020, International journal of environmental research and public health.

[12]  Sergio Trilles,et al.  Air Quality Prediction in Smart Cities Using Machine Learning Technologies based on Sensor Data: A Review , 2020, Applied Sciences.

[13]  E. Bezirtzoglou,et al.  Environmental and Health Impacts of Air Pollution: A Review , 2020, Frontiers in Public Health.

[14]  O. Szymańska,et al.  Is the Polish Smog a New Type of Smog? , 2019, Ecological Chemistry and Engineering S.

[15]  I. C. Gormley,et al.  Mixtures of Experts Models , 2018, 1806.08200.

[16]  A. Peters,et al.  A joint ERS/ATS policy statement: what constitutes an adverse health effect of air pollution? An analytical framework , 2017, European Respiratory Journal.

[17]  R. Beelen,et al.  Air Pollution and Mortality in Seven Million Adults: The Dutch Environmental Longitudinal Study (DUELS) , 2015, Environmental health perspectives.

[18]  Barbara Heude,et al.  Ambient air pollution and low birthweight: a European cohort study (ESCAPE). , 2013, The Lancet. Respiratory medicine.

[19]  B. Brunekreef,et al.  Air pollution and lung cancer incidence in 17 European cohorts: prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE). , 2013, The Lancet. Oncology.

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[22]  A. Getis The Analysis of Spatial Association by Use of Distance Statistics , 2010 .

[23]  A. Bokwa Environmental impacts of long-term air pollution changes in Krakow, Poland , 2008 .

[24]  Corina da Costa Freitas,et al.  Efficient regionalization techniques for socio‐economic geographical units using minimum spanning trees , 2006, Int. J. Geogr. Inf. Sci..

[25]  Leorey O. Marquez,et al.  A framework for linking urban form and air quality , 1999, Environ. Model. Softw..

[26]  Piotr Jankowski,et al.  Integrating Geographical Information Systems and Multiple Criteria Decision-Making Methods , 1995, Int. J. Geogr. Inf. Sci..

[27]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .