Feature Engineering Algorithms for Traffic Dataset

As a result of an increase in the human population globally, traffic congestion in the urban area is becoming worse, which leads to time-consuming, waste of fuel, and, most importantly, the emission of pollutants. Therefore, there is a need to monitor and estimate traffic density. The emergence of an automatic traffic management system allows us to record and monitor motor vehicles’ movement in a road segment. One of the challenges researchers face is when the historical traffic data is given as an annual average that contains incomplete data. The annual average daily traffic (AADT) is an average number of traffic volumes at the roadway segment in a specific location over a year. An example of AADT data is the one given by Road Traffic Volume Malaysia (RTVM), and this data is incomplete. The RTVM provides an average of daily traffic data and one peak hour. The recorded traffic data is for sixteen hours, and the only hourly data given is one hour, from 8.00 am to 9.00 am. Hence there is a need to estimate hourly traffic volume for the remaining hours. Feature engineering can be used to overcome the issue of incomplete data. This paper proposed feature engineering algorithms that can efficiently estimate hourly traffic volume and generate features from the existing dataset for all traffic census stations in Malaysia using queuing theory. The proposed feature engineering algorithms were able to estimate the hourly traffic volume and generate features for three years in Jalan Kepong census station, Kuala Lumpur, Malaysia. The algorithms were evaluated using the Random Forest model and Decision Tree Models. The result shows that our feature engineering algorithms improve machine learning algorithms’ performance except for the prediction of NO2 using Random Forest, which shows the highest MAE, MSE, and RMSE when traffic data was included for prediction. The algorithm is applied in one of the traffic census stations in Kuala Lumpur, and it can be used for the other stations in Malaysia. Additionally, the algorithm can also be used for any annual average daily traffic data if it includes average hourly data. Keywords—Feature engineering algorithm; queuing theory; Road Traffic Volume Malaysia (RTVM); machine learning algorithms

[1]  Francisco C. Malucelli,et al.  Cyclists' exposure to air pollution under different traffic management strategies. , 2020, The Science of the total environment.

[2]  Chi-Hua CHEN,et al.  A Cell Probe-Based Method for Vehicle Speed Estimation , 2020, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[3]  Ibrahim Abaker Targio Hashem,et al.  A Spatial Feature Engineering Algorithm for Creating Air Pollution Health Datasets , 2020 .

[5]  R. Rossi,et al.  Effect of Road Traffic on Air Pollution. Experimental Evidence from COVID-19 Lockdown , 2020, Sustainability.

[6]  A. Norhidayah,et al.  Composition and source determination of heavy metals (HM) in particles in selected primary schools in Pahang , 2017 .

[8]  Zhouwang Yang,et al.  Vehicle Speed Estimation Based on 3D ConvNets and Non-Local Blocks , 2019, Future Internet.

[9]  Yong Fang,et al.  Traffic parameter estimation and control system based on machine vision , 2020 .

[10]  Ibrahim Abaker Targio Hashem,et al.  A novel feature engineering algorithm for air quality datasets , 2020 .

[11]  L. K. Chng,et al.  A GIS-based emission inventory at 1 KM -1KM spatial resolution for particular matter (PM10) in Klang Valley, Malaysia , 2017 .

[12]  A B Shahriman,et al.  Efficient methodology of route selection for driving cycle development , 2017 .

[13]  Luciano R Costa,et al.  Car speed estimation based on image scale factor. , 2020, Forensic science international.

[15]  A. Maurya,et al.  Effect of horizontal curve geometry on vehicle speed distribution: a four-lane divided highway study , 2020 .

[16]  P. Agnolucci,et al.  Annual average daily traffic estimation in England and Wales: An application of clustering and regression modelling , 2020 .

[17]  Mohd Talib Latif,et al.  BTEX Exposure Assessment and Inhalation Health Risks to Traffic Policemen in the Klang Valley Region, Malaysia , 2020 .

[18]  Rama Chellappa,et al.  A Semi-Automatic 2D Solution for Vehicle Speed Estimation from Monocular Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Hai Huang,et al.  Estimation of the Vehicle Speed Using Cross-Correlation Algorithms and MEMS Wireless Sensors , 2021, Sensors.

[20]  Raja Sher Afgun Usmani,et al.  Air pollution and its health impacts in Malaysia: a review , 2020, Air Quality, Atmosphere & Health.

[21]  M. Yoneda,et al.  Traffic characteristics and pollutant emission from road transport in urban area , 2020, Air Quality, Atmosphere & Health.

[22]  Junliang Chen,et al.  MagMonitor: Vehicle Speed Estimation and Vehicle Classification Through A Magnetic Sensor , 2022, IEEE Transactions on Intelligent Transportation Systems.

[23]  Nikola Markovic,et al.  Inferencing hourly traffic volume using data-driven machine learning and graph theory , 2021, Comput. Environ. Urban Syst..

[24]  Nasradeen A. Khalifa,et al.  Non-Pragmatic Data Collection for Road Pavement Damage on Access Road to Residential Estate and the Statistical Analysis Choice , 2016 .

[25]  A. R. Mahayadin,et al.  Development of Driving Cycle for Passenger Car under Real World Driving Conditions in Kuala Lumpur, Malaysia , 2018, IOP Conference Series: Materials Science and Engineering.

[26]  Rakesh Belwal,et al.  Traffic Impact Assessment: A Case of Proposed Hypermarket in Skudai Town of Malaysia , 2013 .

[27]  Mohammad Hossein Anisi,et al.  Modeling Traffic Congestion Based on Air Quality for Greener Environment: An Empirical Study , 2019, IEEE Access.

[28]  Fatima Afifah,et al.  Vehicle Speed Estimation using Image Processing , 2019 .

[29]  Przemysław Sekuła,et al.  Estimating Hourly Traffic Volumes using Artificial Neural Network with Additional Inputs from Automatic Traffic Recorders , 2020 .

[30]  W. S. Voon,et al.  Single-vehicle crashes along rural mountainous highways in Malaysia: An application of random parameters negative binomial model. , 2017, Accident; analysis and prevention.

[31]  Mohd Talib Latif,et al.  Carbon emission from vehicular source in selected industrial areas in Malaysia , 2016 .

[32]  Xianfeng Yang,et al.  Freeway Traffic Speed Estimation by Regression Machine-Learning Techniques Using Probe Vehicle and Sensor Detector Data , 2020 .

[33]  Muhammad Bilal,et al.  Smart Cities Data: Framework, Applications, and Challenges , 2021, Handbook of Smart Cities.

[34]  Jagdish Prasad,et al.  Traffic density estimation using progressive neural architecture search , 2020 .

[35]  Md. Mazharul Haque,et al.  Applying a random parameters Negative Binomial Lindley model to examine multi-vehicle crashes along rural mountainous highways in Malaysia. , 2018, Accident; analysis and prevention.

[36]  Hsin-Mu Tsai,et al.  Vehicle Counting and Speed Estimation with RFID Backscatter Signal , 2019, 2019 IEEE Vehicular Networking Conference (VNC).

[37]  A. R. Mahayadin,et al.  Terengganu routes representation for development of Malaysia Driving Cycle: Route selection methodology , 2018, IOP Conference Series: Materials Science and Engineering.