Stream Traffic Data Archival, Querying, and Analysis with TransDec

Transportation assume no liability for the contents or use thereof. The contents do not necessarily reflect the official views or policies of the State of California or the Department of Transportation. This report does not constitute a standard, specification, or regulation. ABSTRACT The goal of research was to extend the traffic data analysis of the TransDec (short for Transportation Decision-Making) system, which was developed under METRANS 09-26 research grant. The TransDec system is a real-data driven system to support decision-making in transportation systems. With TransDec, so far we have addressed the challenges in visualization, querying and management of dynamic and large-scale spatiotemporal transportation data, in particular, traffic sensors data and moving assets data. † With this proposal, building on our experience in implementing TransDec, we extended our research and technology development efforts under three specific tasks. First, we developed new techniques to create a streaming data archival repository that supports continuous querying and analysis of the vast amount of California transit data from RIITS (Regional Integration of Intelligent Transportation Systems) generated in the form of data streams. Second, we extended the current data-tier of TransDec to a distributed design to enable more scalable and stable computing environment. Finally, to demonstrate the benefits of the archived traffic datasets, we presented a novel proof-of-concept application, namely time-dependent optimal sequenced route (TD-OSR) planner using congestion prediction. This application exploits a subset of the real-world RIITS datasets, and these days we evaluating the ways to make it available for public use.

[1]  Jeffrey Scott Vitter,et al.  Dynamic Maintenance of Wavelet-Based Histograms , 2000, VLDB.

[2]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[3]  Wei Wu,et al.  FINCH: evaluating reverse k-Nearest-Neighbor queries on location data , 2008, Proc. VLDB Endow..

[4]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[5]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[6]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[7]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[8]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[9]  Beng Chin Ooi,et al.  The performance of MapReduce , 2010, Proc. VLDB Endow..

[10]  Philip S. Yu,et al.  Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor , 2009, Proc. VLDB Endow..

[11]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[12]  Dimitris Sacharidis,et al.  SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data , 2005, SIGMOD '05.

[13]  Farnoush Banaei Kashani,et al.  Voronoi-Based Geospatial Query Processing with MapReduce , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[14]  Yossi Matias,et al.  Fast incremental maintenance of approximate histograms , 1997, TODS.

[15]  Naphtali Rishe,et al.  Experiences on Processing Spatial Data with MapReduce , 2009, SSDBM.

[16]  Cyrus Shahabi,et al.  A Multi-Resolution Compression Scheme for EfficientWindow Queries over Road Network Databases , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[17]  David Sun,et al.  COUGAR: the network is the database , 2002, SIGMOD '02.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Gautam Das,et al.  Approximate Query Processing , 2009, Encyclopedia of Database Systems.

[20]  Kai Wang,et al.  Spatial Queries Evaluation with MapReduce , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[21]  Cyrus Shahabi,et al.  Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases , 2004, VLDB.

[22]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[23]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[24]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[25]  Cyrus Shahabi,et al.  ProPolyne: A Fast Wavelet-Based Algorithm for Progressive Evaluation of Polynomial Range-Sum Queries , 2002, EDBT.

[26]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[27]  Ouri Wolfson,et al.  Spatio-temporal data reduction with deterministic error bounds , 2003, DIALM-POMC.

[28]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[29]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .