Feature extraction and clustering analysis of highway congestion

Classification of congestion patterns is important in many areas in traffic planning and management, ranging from policy appraisal, database design, to prediction and real-time control. One of the key constraints in applying machine learning techniques for classification is the availability of sufficient data (traffic patterns) with clear and undisputed labels, e.g. traffic pattern X or Y. The challenge is that labelling traffic patterns (e.g. combinations of congested and freely flow areas over time and space) is highly subjective. In our view this means that assessment of how well algorithms label the data should also include a qualitative component that focuses on what the found patterns really mean for traffic flow operations and applications. In this study, we investigate the application of clustering analysis to obtain labels automatically from the data, where we indeed first qualitatively assess how meaningful the found labels are, and subsequently test quantitatively how well the labels separate the resulting feature space. By transforming traffic measurements (speeds) into (colored) images, two different approaches are proposed to extract the features of a large number of traffic patterns for clustering: point-based and area-based. The point-based approach is widely applied in the image processing literature, and explores local interest points in images (i.e. where large changes occur in color intensity); whereas a new area-based approach combines domain knowledge with Watershed segmentation to partition the images into different spatial-temporal segments from which domain specific features, such as wide moving jam patterns, are extracted. The results show that the Watershed segmentation separates the traffic (congestion) patterns into more meaningful and separable classes, comparable to those that have been proposed in the literature. Since there is no ground-truth set of labels, the quantitative assessment tests how well both methods are able to separate the respective feature spaces they construct for the (large) database of traffic patterns. We argue that the more crisp this separation is; the better the labelling has turned out. For this quantitative comparison we train a multinomial classifier that maps unseen patterns to the labels discovered by each of the two labeling approaches. The most important result is that the classifier using the area-based feature vector achieves the highest average levels of confidence in its decisions to classify patterns, implying a highly separable feature vector space. We argue this is good news! Not only does the combination of image processing (Watershed) and domain knowledge (traffic flow characteristics) lead to meaningful labels that can be automatically retrieved from large databases of data; this method also leads to a more efficient separation of the resulting feature space. Our next endeavor is to further refine and use this method to develop a search engine for the (rapidly growing) 200 TB historical database of traffic data hosted by the Dutch National Datawarehouse (NDW).

[1]  Dirk Helbing,et al.  Reconstructing the spatio-temporal traffic dynamics from stationary detector data , 2002 .

[2]  Xinhua Zhuang,et al.  Image Analysis Using Mathematical Morphology , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[5]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[6]  D. Helbing,et al.  Theoretical vs. empirical classification and prediction of congested traffic states , 2009, 0903.0929.

[7]  D. Böhning Multinomial logistic regression algorithm , 1992 .

[8]  Chih-Jen Lin,et al.  Iterative Scaling and Coordinate Descent Methods for Maximum Entropy , 2009, ACL.

[9]  Eréndira Rendón,et al.  Internal versus External cluster validation indexes , 2011 .

[10]  Eleni I. Vlahogianni,et al.  Optimized and meta-optimized neural networks for short-term traffic flow prediction: A genetic approach , 2005 .

[11]  Hubert Rehborn,et al.  Recognition and tracking of spatial–temporal congested traffic patterns on freeways , 2004 .

[12]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[13]  Serge P. Hoogendoorn,et al.  A Robust and Efficient Method for Fusing Heterogeneous Data from Traffic Sensors on Freeways , 2010, Comput. Aided Civ. Infrastructure Eng..

[14]  D. Helbing,et al.  Phase diagram of tra c states in the presence of inhomogeneities , 1998, cond-mat/9809324.

[15]  I. Chakrabarti,et al.  An Efficient Hillclimbing-based Watershed Algorithm and its Prototype Hardware Architecture , 2008, J. Signal Process. Syst..

[16]  Hai Le Vu,et al.  Traffic COngestion pattern classification using multi-class SVM , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[17]  Dirk Helbing,et al.  Empirical Features of Congested Traffic States and Their Implications for Traffic Modeling , 2007, Transp. Sci..

[18]  Geert Wets,et al.  Traffic accident segmentation by means of latent class clustering. , 2008, Accident; analysis and prevention.

[19]  Huizhao Tu,et al.  Travel time unreliability on freeways: Why measures based on variance tell only half the story , 2008 .

[20]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[21]  Hilmi Berk Celikoglu,et al.  Extension of Traffic Flow Pattern Dynamic Classification by a Macroscopic Model Using Multivariate Clustering , 2016, Transp. Sci..

[22]  B. Kerner Empirical macroscopic features of spatial-temporal traffic patterns at highway bottlenecks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Hai L. Vu,et al.  Traffic Congestion Pattern Classification Using Multiclass Active Shape Models , 2017 .

[24]  Markos Papageorgiou,et al.  Macroscopic traffic flow model validation at congested freeway off-ramp areas , 2014 .

[25]  Siddheswar Ray,et al.  Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[26]  N. Geroliminis,et al.  A dynamic cordon pricing scheme combining the Macroscopic Fundamental Diagram and an agent-based traffic model , 2012 .

[27]  Serge Beucher,et al.  The Morphological Approach to Segmentation: The Watershed Transformation , 2018, Mathematical Morphology in Image Processing.

[28]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[29]  Hani S. Mahmassani,et al.  Spatial and Temporal Characterization of Travel Patterns in a Traffic Network Using Vehicle Trajectories , 2015 .

[30]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[31]  Markos Papageorgiou,et al.  RENAISSANCE – A Unified Macroscopic Model-Based Approach to Real-Time Freeway Network Traffic Surveillance , 2006 .

[32]  Chih-Jen Lin,et al.  Dual coordinate descent methods for logistic regression and maximum entropy models , 2011, Machine Learning.

[33]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[34]  Francesc Soriguera,et al.  Estimation of traffic stream space mean speed from time aggregations of double loop detector data , 2011 .

[35]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[36]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[37]  Lee,et al.  Phase diagram of congested traffic flow: An empirical study , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[38]  Hilmi Berk Celikoglu,et al.  An Approach to Dynamic Classification of Traffic Flow Patterns , 2013, Comput. Aided Civ. Infrastructure Eng..

[39]  Irwin Sobel,et al.  An Isotropic 3×3 image gradient operator , 1990 .

[40]  Jiwon Kim,et al.  Trajectory Clustering for Discovering Spatial Traffic Flow Patterns in Road Networks , 2015 .

[41]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[42]  Amara Lynn Graps,et al.  An introduction to wavelets , 1995 .

[43]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[44]  Jianfeng Gao,et al.  A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing , 2007, ACL.

[45]  Serge P. Hoogendoorn,et al.  Two fast implementations of the Adaptive Smoothing Method used in highway traffic state estimation , 2010, 13th International IEEE Conference on Intelligent Transportation Systems.

[46]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .