A new feature selection method based on machine learning technique for air quality dataset

Abstract In the recent years, Air Pollution has become a matter of serious concern that leads to millions of premature deaths which motivates the researcher to predict the air quality in advance. Air Quality Index (AQI) always indicates the status of air quality. This index value is based on various pollutants like PM10, PM2.5, NO2, SO2, CO, O3, NH3, and Pb. Out of them PM2.5 nowadays is a major pollutant which is heavily affecting the quality of air. So, the major focus of this study is towards this parameter. PM2.5 is also dependent on various parameters, so in this study, a new feature selection method named as Causality Based Linear method has been proposed to select the most relevant parameters which affect the pollution. Experimental work has been carried out using existing machine learning techniques and proposed method on the air quality dataset of Delhi. It has been observed that the proposed method extracts wind speed, carbon monoxide and nitrogen dioxide as the key parameters and further accuracy of this method has been compared with the four existing methods where feature selection has not been considered. It has been found that the proposed method has given better accuracy with the key parameters.

[1]  Olivier Grunder,et al.  A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine. , 2017, The Science of the total environment.

[2]  C. Ai,et al.  Computing Interaction Effects and Standard Errors in Logit and Probit Models , 2004 .

[3]  D. Hemanth,et al.  Monitoring the Impact of Economic Crisis on Crime in India Using Machine Learning , 2019 .

[4]  George Sugihara,et al.  Detecting Causality in Complex Ecosystems , 2012, Science.

[5]  Mamta Mittal,et al.  A Study of Various Air Quality Prediction Models , 2018 .

[6]  Mamta Mittal,et al.  Big Data and Machine Learning Based Secure Healthcare Framework , 2018 .

[7]  J. Anitha,et al.  Diabetic Retinopathy Diagnosis from Retinal Images Using Modified Hopfield Neural Network , 2018, Journal of Medical Systems.

[8]  Hong Huang,et al.  Relevance analysis and short-term prediction of PM2.5 concentrations in Beijing based on multi-source data , 2017 .

[9]  C. Ai,et al.  Interaction terms in logit and probit models , 2003 .

[10]  Felix Mora-Camino,et al.  Dynamic programming applied to rough sets attribute reduction , 2011 .

[11]  D. Makowski,et al.  Prediction of N2O emission from local information with Random Forest. , 2013, Environmental pollution.

[12]  Aranildo R. Lima,et al.  Evaluating hourly air quality forecasting in Canada with nonlinear updatable machine learning methods , 2017, Air Quality, Atmosphere & Health.

[13]  Shikha Gupta,et al.  Identifying pollution sources and predicting urban air quality using ensemble learning methods , 2013 .

[14]  Yves Rybarczyk,et al.  Machine learning approach to forecasting urban pollution , 2016, 2016 IEEE Ecuador Technical Chapters Meeting (ETCM).

[15]  G. Lemasters,et al.  Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches. , 2017, Atmospheric environment.

[16]  Abdullah Kadri,et al.  Urban Air Pollution Monitoring System With Forecasting Models , 2016, IEEE Sensors Journal.

[17]  Chung-Ming Liu Effect of PM2.5 on AQI in Taiwan , 2002, Environ. Model. Softw..

[18]  Yves Rybarczyk,et al.  Modeling PM2.5 Urban Pollution Using Machine Learning and Selected Meteorological Parameters , 2017, J. Electr. Comput. Eng..