A Spatial Feature Engineering Algorithm for Creating Air Pollution Health Datasets

Dear Editor: This manuscript is already online in techrxiv (https://doi.org/10.36227/techrxiv.12376427.v1), and the co-authors are not correctly associated with the published preprint so we are submitting this again by associating the co-authors. We have also improved the similarity report, the similarity index is 11% of this submission. Thank you.Abstract: Air pollution is one of the significant causes of mortality and morbidity every year. In recent years, many researchers have focused their attention on the associations of air pollution and health. These studies used two types of data in their studies, i.e., air pollution data and health data. Feature engineering is used to create and optimize air quality and health features. In order to merge these datasets residential address, community/county/block/city and hospital/school address are used. Using residence address or any location becomes a spatial problem when the Air Quality Monitoring (AQM) stations are concentrated in urban areas within the regions and an overlap in the AQM stations in urban areas coverage area, which raises the question that how to associate the patients with the relevant AQM station. Also, in most of the studies the distance of patients to the AQM stations is also not taken into account. In this study, we propose a four-part spatial feature engineering algorithm to find the coordinates for health data, calculate distances with AQM stations and associate health records to the nearest AQM station. Hence, removing the limitations of current air pollution health datasets. The proposed algorithm is applied as a case study in Klang Valley, Malaysia. The results show that the proposed algorithm can generate air pollution health dataset efficiently and the algorithm also provides the radius facility to exclude the patients who are situated far away from the stations.