Dear Editor: This manuscript is already online in techrxiv (https://doi.org/10.36227/techrxiv.12376427.v1), and the co-authors are not correctly associated with the published preprint so we are submitting this again by associating the co-authors. We have also improved the similarity report, the similarity index is 11% of this submission. Thank you.Abstract: Air pollution is one of the significant causes
of mortality and morbidity every year. In recent years, many researchers have
focused their attention on the associations of air pollution and health. These
studies used two types of data in their studies, i.e., air pollution data and
health data. Feature engineering is used to create and optimize air quality and
health features. In order to merge these datasets residential address,
community/county/block/city and hospital/school address are used. Using
residence address or any location becomes a spatial problem when the Air
Quality Monitoring (AQM) stations are concentrated in urban areas within the
regions and an overlap in the AQM stations in urban areas coverage area, which
raises the question that how to associate the patients with the relevant AQM
station. Also, in most of the studies the distance of patients to the AQM
stations is also not taken into account. In this study, we propose a four-part
spatial feature engineering algorithm to find the coordinates for health data,
calculate distances with AQM stations and associate health records to the
nearest AQM station. Hence, removing the limitations of current air pollution
health datasets. The proposed algorithm is applied as a case study in Klang
Valley, Malaysia. The results show that the proposed algorithm can generate air
pollution health dataset efficiently and the algorithm also provides the radius
facility to exclude the patients who are situated far away from the stations.