Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data

Trajectory data are prevalent in systems that monitor the locations of moving objects. In a location-based service, for instance, the positions of vehicles are continuously monitored through GPS; the trajectory of each vehicle describes its movement history. We study joins on two sets of trajectories, generated by two sets M and R of moving objects. For each entity in M, a join returns its k nearest neighbors from R. We examine how this query can be evaluated in cloud environments. This problem is not trivial, due to the complexity of the trajectory, and the fact that both the spatial and temporal dimensions of the data have to be handled. To facilitate this operation, we propose a parallel solution framework based on MapReduce. We also develop a novel bounding technique, which enables trajectories to be pruned in parallel. Our approach can be used to parallelize existing single-machine trajectory join algorithms. We also study a variant of the join, which can further improve query efficiency. To evaluate the efficiency and the scalability of our solutions, we have performed extensive experiments on large real and synthetic datasets.

[1]  Nikos Pelekis,et al.  Nearest Neighbor Search on Moving Object Trajectories , 2005, SSTD.

[2]  Chris Jermaine,et al.  Closest-Point-of-Approach Join for Moving Object Histories , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  Yufei Tao,et al.  Continuous Nearest Neighbor Search , 2002, VLDB.

[4]  Raymond T. Ng,et al.  Very large data bases , 1994 .

[5]  Rudolf Bayer,et al.  Symmetric binary B-Trees: Data structure and maintenance algorithms , 1972, Acta Informatica.

[6]  Min Wang,et al.  Efficient Multi-way Theta-Join Processing Using MapReduce , 2012, Proc. VLDB Endow..

[7]  Xing Xie,et al.  T-drive: driving directions based on taxi trajectories , 2010, GIS '10.

[8]  M. Salvato,et al.  Evolution of the Frequency of Luminous (≥LV⋆) Close Galaxy Pairs at z < 1.2 in the COSMOS Field , 2007, 0705.2266.

[9]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[10]  Christian S. Jensen,et al.  Nearest and reverse nearest neighbor queries for moving objects , 2006, The VLDB Journal.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[13]  Seung-won Hwang,et al.  Robust distributed indexing for locality-skewed workloads , 2012, CIKM '12.

[14]  Reynold Cheng,et al.  Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[15]  Farnoush Banaei Kashani,et al.  Voronoi-Based Geospatial Query Processing with MapReduce , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[16]  Yannis Theodoridis,et al.  On the Generation of Spatiotemporal Datasets , 1999 .

[17]  Reynold Cheng,et al.  Evaluating multi-way joins over discounted hitting time , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[18]  Neta A. Bahcall,et al.  The Spatial correlation function of RICH clusters of galaxies , 1983 .

[19]  Michael Stonebraker,et al.  Skew-Aware Join Optimization for Array Databases , 2015, SIGMOD Conference.

[20]  Thomas Brinkhoff,et al.  A Framework for Generating Network-Based Moving Objects , 2002, GeoInformatica.

[21]  Sridhar Ramaswamy,et al.  Scalable Sweeping-Based Spatial Join , 1998, VLDB.

[22]  J. K. Kearney,et al.  Stream Editing for Animation , 1990 .

[23]  Yu Zheng,et al.  Computing with Spatial Trajectories , 2011, Computing with Spatial Trajectories.

[24]  Beng Chin Ooi,et al.  Efficient Processing of k Nearest Neighbor Joins using MapReduce , 2012, Proc. VLDB Endow..

[25]  Philip A. Ianna,et al.  The solar neighborhood IV: discovery of the twentieth nearest star , 1997 .

[26]  Walid G. Aref,et al.  SINA: scalable incremental processing of continuous queries in spatio-temporal databases , 2004, SIGMOD '04.

[27]  Chen Li,et al.  Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.

[28]  Lei Chen,et al.  Finding time period-based most frequent path in big trajectory data , 2013, SIGMOD '13.

[29]  Jignesh M. Patel,et al.  Design and evaluation of trajectory join algorithms , 2009, GIS.

[30]  Feifei Li,et al.  Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.