Spatiotemporal summarization of traffic data streams

With resource-efficient summarization and accurate reconstruction of the historic traffic sensor data, one can effectively manage and optimize transportation systems (e.g., road networks) to become smarter (better mobility, less congestion, less travel time, and less travel cost) and greener (less waste of fuel and less greenhouse gas production). The existing data summarization (and archival) techniques are generic and are not designed to leverage the unique characteristics of the traffic data for effective data reduction. In this paper, we propose and explore a family of data summaries that take advantage of the high temporal and spatial redundancy/correlation among sensor readings from individual sensors and sensor groups, respectively, for effective data reduction. In particular, with these summaries we derive and maintain a "signature" as well as a series of "outliers" for the readings received from each individual sensor or group of co-located sensors. While signatures capture the typical readings that estimate the actual readings with bounded error, the outliers represent the actual readings where the error-bound is violated. With the combination of signatures and outliers, our proposed data summaries can effectively represent the actual data with much smaller storage footprint, while allowing for efficient querying of the sensor data with bounded error. Our experiments with a real traffic sensor dataset shows that our proposed data summaries use only 23% of the storage space otherwise required for storing the actual data, while allowing for highly accurate query results with guaranteed precision.

[1]  David J. DeWitt,et al.  The Niagara Internet Query System , 2001, IEEE Data Eng. Bull..

[2]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[3]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[4]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[5]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[6]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[7]  David Sun,et al.  COUGAR: the network is the database , 2002, SIGMOD '02.

[8]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[9]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[10]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[11]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[12]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[13]  Jeffrey Scott Vitter,et al.  Dynamic Maintenance of Wavelet-Based Histograms , 2000, VLDB.

[14]  Yossi Matias,et al.  Fast incremental maintenance of approximate histograms , 1997, TODS.

[15]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[16]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[17]  Cyrus Shahabi,et al.  ProPolyne: A Fast Wavelet-Based Algorithm for Progressive Evaluation of Polynomial Range-Sum Queries , 2002, EDBT.

[18]  Dimitris Sacharidis,et al.  SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data , 2005, SIGMOD '05.

[19]  Ouri Wolfson,et al.  Spatio-temporal data reduction with deterministic error bounds , 2003, DIALM-POMC '03.