Discovering patterns in traffic sensor data

We maintain a one of a kind, large-scale and high resolution (both spatially and temporally) traffic sensor dataset collected from the entire Los Angeles County road network. Traffic sensors (installed under the road pavement) are used to measure real-time traffic flows through road segments. In this paper, we exploit this dataset to rigorously verify two popular instinctive understandings about traffic flows on road segments: 1) each road segment has a typical traffic flow (known by local travelers) and one can often categorize road segments based on the similarity of their traffic flows, and 2) the road segments within each category not only have similar traffic flows but also are similar in their other characteristics (such as locality, connectivity). Toward this end, we developed a hypothesis analysis framework based on a variety of clustering and correlation evaluation techniques and leveraged this framework to respectively show the following. First, the set of road segments can indeed be partitioned into a set of distinct subpartitions with similar traffic flows, and there is a limited number of signature traffic patterns/labels each of which can accurately represent all traffic flows of a subpartition of the road segments. Second, all segments within each subpartition (represented by one signature) are also highly similar in three other characteristics, namely, direction, connectivity and locality. Our experiments verify our observations with high confidence.

[1]  Yang Du,et al.  Finding Fastest Paths on A Road Network with Speed Patterns , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Chandra R. Bhat,et al.  Comprehensive Econometric Microsimulator for Daily Activity-Travel Patterns , 2004 .

[4]  J DeLaurentiis,et al.  REGIONAL INTEGRATION OF INTELLIGENT TRANSPORTATION SYSTEMS FOR TRANSIT , 2002 .

[5]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[6]  Jeffrey Xu Yu,et al.  Finding time-dependent shortest paths over large graphs , 2008, EDBT '08.

[7]  Robert Veroff,et al.  A Bayesian Network Classification Methodology for Gene Expression Data , 2004, J. Comput. Biol..

[8]  Jessica Y. Guo,et al.  A Comprehensive Econometric Micro-simulator for Daily Activity-travel Patterns ( CEMDAP ) , 2004 .

[9]  Farnoush Banaei Kashani,et al.  Towards modeling the traffic data on road networks , 2009, IWCTS '09.

[10]  Peter Vovsha,et al.  Advanced activity-based models in context of planning decisions , 2006 .

[11]  Shashi Shekhar,et al.  A Unified Approach to Detecting Spatial Outliers , 2003, GeoInformatica.

[12]  C. Lu A Uniied Approach to Spatial Outliers Detection , 2003 .

[13]  Liqing Zhang,et al.  Temporal and Spatial Features of Single-Trial EEG for Brain-Computer Interface , 2007, Comput. Intell. Neurosci..

[14]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[15]  Sarit Kraus,et al.  Scalable Classification in Large Scale Spatiotemporal Domains Applied to Voltage-Sensitive Dye Imaging , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  Patrick Athol,et al.  INTERDEPENDENCE OF CERTAIN OPERATIONAL CHARACTERISTICS WITHIN A MOVING TRAFFIC STREAM , 1965 .

[17]  Dino Pedreschi,et al.  Trajectory pattern mining , 2007, KDD '07.

[18]  Shashi Shekhar,et al.  Data Mining and Visualization of Twin-Cities Traffic Data , 2001 .

[19]  Ludovic Denoyer,et al.  Bayesian network model for semi-structured document classification , 2004, Inf. Process. Manag..

[20]  Cyrus Shahabi,et al.  Feature Subset Selection on Multivariate Time Series with Extremely Large Spatial Features , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).