On Missing Traffic Data Imputation Based on Fuzzy C-Means Method by Considering Spatial–Temporal Correlation

The lack of some traffic flow data seriously affects the quality of data collection and analysis in the traffic system. Completing the missing data is one of the most important steps in achieving the functions of intelligent transportation systems. In this paper an approach based on fuzzy C-means (FCM) imputes missing traffic volume data in loop detectors. With spatial–temporal correlation between detectors, the conventional vector-based data structure is first transformed into a matrix-based data pattern. Then, the genetic algorithm is applied to optimize the parameters of cluster size and weighting factor in the FCM model. Finally, the actual traffic flow volume collected at different locations is designed as a testing data set, and two indicators including root mean square error and relative accuracy are used to evaluate the imputation performance of the proposed method by comparison with some conventional methods (multiple linear regression, autoregressive integrated moving average model, and average historical method) by missing ratio. The applications in four scenarios demonstrate that the FCM-based imputation method outperforms conventional methods.

[1]  Shawn Turner,et al.  IMPUTING MISSING VALUES IN ITS DATA ARCHIVES FOR INTERVALS UNDER 5 MINUTES , 2001 .

[2]  Billy M. Williams,et al.  Modeling and Forecasting Vehicular Traffic Flow as a Seasonal ARIMA Process: Theoretical Basis and Empirical Results , 2003, Journal of Transportation Engineering.

[3]  Stephen Boyles Comparison of Interpolation Methods for Missing Traffic Volume Data , 2011 .

[4]  James C. Bezdek,et al.  Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm , 2002, Pattern Recognit. Lett..

[5]  Jitender S. Deogun,et al.  Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method , 2004, Rough Sets and Current Trends in Computing.

[6]  C. Holt Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[7]  Heiko Timm,et al.  Different approaches to fuzzy clustering of incomplete datasets , 2004, Int. J. Approx. Reason..

[8]  Yi Zhang,et al.  PPCA-Based Missing Data Imputation for Traffic Flow Volume: A Systematical Approach , 2009, IEEE Transactions on Intelligent Transportation Systems.

[9]  M. Zhong,et al.  ESTIMATION OF MISSING TRAFFIC COUNTS USING FACTOR, GENETIC, NEURAL AND REGRESSION TECHNIQUES , 2004 .

[10]  J. Shao,et al.  Nearest Neighbor Imputation for Survey Data , 2000 .

[11]  Haitham Al-Deek,et al.  New Algorithms for Filtering and Imputation of Real-Time and Archived Dual-Loop Detector Data in I-4 Data Warehouse , 2004 .

[12]  Stephen D. Clark,et al.  A comparative assessment of current and new techniques for detecting outliers and estimating missing values in transport related time series data , 1993 .

[13]  Hong Gu,et al.  A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data , 2010, Expert Syst. Appl..

[14]  Tshilidzi Marwala,et al.  The use of genetic algorithms and neural networks to approximate missing data in database , 2005, IEEE 3rd International Conference on Computational Cybernetics, 2005. ICCC 2005..

[15]  Abu S.M. Masud,et al.  A computer program for time series forecasting using single and double exponential smoothing technique , 1986 .

[16]  Biswajit Basu,et al.  Time-series modelling for forecasting vehicular traffic flow in Dublin , 2005 .

[17]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[18]  Hongan Wang,et al.  Missing Data Imputation: A Fuzzy K-means Clustering Algorithm over Sliding Window , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[19]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[20]  B. Ran,et al.  Traffic Missing Data Completion With Spatial-temporal Correlations , 2014 .

[21]  William T. Scherer,et al.  IMPUTATION TECHNIQUES TO ACCOUNT FOR MISSING DATA IN SUPPORT OF INTELLIGENT TRANSPORTATION SYSTEMS APPLICATIONS , 2003 .

[22]  Li Li,et al.  Efficient missing data imputing for traffic flow by considering temporal and spatial dependence , 2013 .

[23]  Tshilidzi Marwala,et al.  Computational Intelligence for Missing Data Imputation, Estimation, and Management - Knowledge Optimization Techniques , 2009, Computational Intelligence for Missing Data Imputation, Estimation, and Management.

[24]  Alessandro G. Di Nuovo,et al.  Missing data analysis with fuzzy C-Means: A study of its application in a psychological scenario , 2011, Expert Syst. Appl..

[25]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[26]  Bruce Ramsey,et al.  AUTOCOUNTS: A WAY TO ANALYSE AUTOMATIC TRAFFIC COUNT DATA. , 1994 .

[27]  Alexander Skabardonis,et al.  Detecting Errors and Imputing Missing Data for Single-Loop Surveillance Systems , 2003 .

[28]  Angshuman Guin,et al.  Multiple Imputation Scheme for Overcoming the Missing Values and Variability Issues in ITS Data , 2005 .