The 7 th International Conference on Ambient Systems , Networks and Technologies ( ANT 2016 ) Application of Principal Component Analysis for Outlier Detection in Heterogeneous Traffic Data

Level-of-service (LOS) measures of two-lane highways exhibit incompatibility if the prevailing traffic is heterogeneous in character. Thus, such traffic warrants development of LOS criteria on the basis of compatible measures which capture its characteristics. The present paper has suggested the use of percent speed-reduction and percent slower vehicles, as the measures of performance, while defining LOS criteria. Defining such criteria is basically a classification problem and clustering could be applied as an effective technique for its solution. However, heterogeneity in the traffic mix results in the presence of significant proportion of outliers in the data set, which can distort the results and render into misleading or useless outcomes. The study considers principal component analysis to be an efficient technique in detecting outliers from the data set and accordingly applies it on the proposed LOS measures. An iterative process, adopted for removing outliers, indicates that significant proportion of outliers comprises of non-motorized traffic data; this accordingly ensures reliability of the data set. The study concluded the unfeasibility of LOS assessment of the entire traffic, considering both motorized and non-motorized modes, with respect to a common scale.

[1]  A. Ben Hamza,et al.  Cluster pca for outliers detection in high-dimensional data , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[2]  Ibrahim Hassan Hashim,et al.  Evaluation of performance measures for rural two-lane roads in Egypt , 2011 .

[3]  Ahmed Al-Kaisy,et al.  Evaluating new methodologies for estimating performance on two-lane highways , 2008 .

[4]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[5]  Nilanjan Ray,et al.  Snake Validation: A PCA-Based Outlier Detection Method , 2009, IEEE Signal Processing Letters.

[6]  Andrew M. Kuhn Multivariate Statistical Methods in Quality Management , 2005, Technometrics.

[7]  I. Guyon,et al.  Detecting stable clusters using principal component analysis. , 2003, Methods in molecular biology.

[8]  Jitendra Kumar,et al.  Identification and Visualization of Dominant Patterns and Anomalies in Remotely Sensed Vegetation Phenology Using a Parallel Tool for Principal Components Analysis , 2013, ICCS.

[9]  Ruben H. Zamar,et al.  Scalable robust covariance and correlation estimates for data mining , 2002, KDD.

[10]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[11]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[12]  Tarek Sayed,et al.  Field Evaluation of Traffic Performance Measures for Two-Lane Highways in Spain , 2014 .

[13]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[14]  M. Forina,et al.  Multivariate calibration. , 2007, Journal of chromatography. A.

[15]  S. C. Van As,et al.  The operational analysis of two-lane rural highways , 2004 .

[16]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[17]  Satish Chandra,et al.  Evaluation of Performance Measures for Two-Lane Intercity Highways under Mixed Traffic Conditions , 2015 .

[18]  S. Morgan,et al.  Outlier detection in multivariate analytical chemical data. , 1998, Analytical chemistry.

[19]  J. Hair Multivariate data analysis , 1972 .

[20]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[21]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[22]  Ahmed Al-Kaisy,et al.  Indicators of Performance on Two-Lane Rural Highways , 2008 .

[23]  Manish Pal,et al.  Assessment of Level-of-Service of Two-Lane Highways with Heterogeneous Traffic , 2015 .

[24]  Fred Spiring,et al.  Introduction to Statistical Quality Control , 2007, Technometrics.

[25]  Kwang-Ho Ro,et al.  Outlier detection for high-dimensional data , 2015 .