Highly scalable trip grouping for large-scale collective transportation systems

Transportation-related problems, like road congestion, parking, and pollution, are increasing in most cities. In order to reduce traffic, recent work has proposed methods for vehicle sharing, for example for sharing cabs by grouping "closeby" cab requests and thus minimizing transportation cost and utilizing cab space. However, the methods published so far do not scale to large data volumes, which is necessary to facilitate large-scale collective transportation systems, e.g., ride-sharing systems for large cities. This paper presents highly scalable trip grouping algorithms, which generalize previous techniques and support input rates that can be orders of magnitude larger. The following three contributions make the grouping algorithms scalable. First, the basic grouping algorithm is expressed as a continuous stream query in a data stream management system to allow for a very large flow of requests. Second, following the divide-and-conquer paradigm, four space-partitioning policies for dividing the input data stream into sub-streams are developed and implemented using continuous stream queries. Third, using the partitioning policies, parallel implementations of the grouping algorithm in a parallel computing environment are described. Extensive experimental results show that the parallel implementation using simple adaptive partitioning methods can achieve speed-ups of several orders of magnitude without significantly degrading the quality of the grouping.

[1]  Tore Risch,et al.  Customizable Parallel Execution of Scientific Stream Queries , 2005, VLDB.

[2]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[3]  Dieter Pfoser,et al.  Novel Approaches in Query Processing for Moving Object Trajectories , 2000, VLDB 2000.

[4]  Torben Bach Pedersen,et al.  ST--ACTS: a spatio-temporal activity simulator , 2006, GIS '06.

[5]  Torben Bach Pedersen,et al.  Mining Long, Sharable Patterns in Trajectories of Moving Objects , 2009, STDBM.

[6]  Elke A. Rundensteiner,et al.  A Dynamically Adaptive Distributed System for Processing Complex Continuous Queries , 2005, VLDB.

[7]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[8]  Torben Bach Pedersen,et al.  Cab-sharing: An Effective, Door-to-Door, On-Demand Transportation Service , 2007 .

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[11]  Dieter Pfoser,et al.  Novel Approaches to the Indexing of Moving Object Trajectories , 2000, VLDB.

[12]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[13]  Alfons Kemper,et al.  StreamGlobe: Processing and Sharing Data Streams in Grid-Based P2P Infrastructures , 2005, VLDB.

[14]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[15]  Michael Ian Shamos,et al.  Divide-and-conquer in multidimensional space , 1976, STOC '76.

[16]  Tore Risch,et al.  Using stream queries to measure communication performance of a parallel computing environment , 2007, 27th International Conference on Distributed Computing Systems Workshops (ICDCSW'07).

[17]  Geoffrey A. Frank,et al.  A Parallel Architecture for k-d Trees , 1988 .

[18]  Raymond K. Wong,et al.  A New Approach for Cluster Detection for Large Datasets with High Dimensionality , 2005, DaWaK.

[19]  Teodor Gabriel Crainic,et al.  Flexible many-to-few + few-to-many = an almost personalized transit system , 2001 .

[20]  Tore Risch,et al.  Processing High-Volume Stream Queries on a Supercomputer , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[21]  Torben Bach Pedersen,et al.  Spatio-temporal Rule Mining: Issues and Techniques , 2005, DaWaK.

[22]  Ying Xing,et al.  Dynamic load distribution in the Borealis stream processor , 2005, 21st International Conference on Data Engineering (ICDE'05).