Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data

Trajectory data are prevalent in systems that monitor the locations of moving objects. In a location-based service, for instance, the positions of vehicles are continuously monitored through GPS; the trajectory of each vehicle describes its movement history. We study joins on two sets of trajectories, generated by two sets $M$ and $R$ of moving objects. For each entity in $M$ , a join returns its $k$ nearest neighbors from $R$ . We examine how this query can be evaluated in cloud environments. This problem is not trivial, due to the complexity of the trajectory, and the fact that both the spatial and temporal dimensions of the data have to be handled. To facilitate this operation, we propose a parallel solution framework based on MapReduce. We also develop a novel bounding technique, which enables trajectories to be pruned in parallel. Our approach can be used to parallelize existing single-machine trajectory join algorithms. We also study a variant of the join, which can further improve query efficiency. To evaluate the efficiency and the scalability of our solutions, we have performed extensive experiments on large real and synthetic datasets.

[1]  Michael Stonebraker,et al.  Skew-Aware Join Optimization for Array Databases , 2015, SIGMOD Conference.

[2]  Jignesh M. Patel,et al.  Design and evaluation of trajectory join algorithms , 2009, GIS.

[3]  Reynold Cheng,et al.  Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[4]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[5]  Christian S. Jensen,et al.  Nearest and reverse nearest neighbor queries for moving objects , 2006, The VLDB Journal.

[6]  Yu Zheng,et al.  Computing with Spatial Trajectories , 2011, Computing with Spatial Trajectories.

[7]  Sridhar Ramaswamy,et al.  Scalable Sweeping-Based Spatial Join , 1998, VLDB.

[8]  J. K. Kearney,et al.  Stream Editing for Animation , 1990 .

[9]  Qian Zhang An Efficient Inter Mode Decision Method for MVC , 2012 .

[10]  Lei Chen,et al.  Finding time period-based most frequent path in big trajectory data , 2013, SIGMOD '13.

[11]  Farnoush Banaei Kashani,et al.  Voronoi-Based Geospatial Query Processing with MapReduce , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[12]  Yannis Theodoridis,et al.  On the Generation of Spatiotemporal Datasets , 1999 .

[13]  Reynold Cheng,et al.  Evaluating multi-way joins over discounted hitting time , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[14]  Chen Li,et al.  Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.

[15]  Chris Jermaine,et al.  Closest-Point-of-Approach Join for Moving Object Histories , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  Yufei Tao,et al.  Continuous Nearest Neighbor Search , 2002, VLDB.

[17]  Beng Chin Ooi,et al.  Efficient Processing of k Nearest Neighbor Joins using MapReduce , 2012, Proc. VLDB Endow..

[18]  Nikos Pelekis,et al.  Nearest Neighbor Search on Moving Object Trajectories , 2005, SSTD.

[19]  Philip A. Ianna,et al.  The solar neighborhood IV: discovery of the twentieth nearest star , 1997 .

[20]  Walid G. Aref,et al.  SINA: scalable incremental processing of continuous queries in spatio-temporal databases , 2004, SIGMOD '04.

[21]  Christian S. Jensen,et al.  Nearest neighbor and reverse nearest neighbor queries for moving objects , 2002, Proceedings International Database Engineering and Applications Symposium.

[22]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[23]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[24]  Feifei Li,et al.  Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.

[25]  Neta A. Bahcall,et al.  The Spatial correlation function of RICH clusters of galaxies , 1983 .

[26]  Rudolf Bayer,et al.  Symmetric binary B-Trees: Data structure and maintenance algorithms , 1972, Acta Informatica.

[27]  Min Wang,et al.  Efficient Multi-way Theta-Join Processing Using MapReduce , 2012, Proc. VLDB Endow..

[28]  Seung-won Hwang,et al.  Robust distributed indexing for locality-skewed workloads , 2012, CIKM '12.

[29]  Thomas Brinkhoff,et al.  A Framework for Generating Network-Based Moving Objects , 2002, GeoInformatica.

[30]  Xing Xie,et al.  T-drive: driving directions based on taxi trajectories , 2010, GIS '10.

[31]  M. Salvato,et al.  Evolution of the Frequency of Luminous (≥LV⋆) Close Galaxy Pairs at z < 1.2 in the COSMOS Field , 2007, 0705.2266.