REPOSE: Distributed Top-k Trajectory Similarity Search with Local Reference Point Tries

Trajectory similarity computation is a fundamental component in a variety of real-world applications, such as ridesharing, road planning, and transportation optimization. Recent advances in mobile devices have enabled an unprecedented increase in the amount of available trajectory data such that efficient query processing can no longer be supported by a single machine. As a result, means of performing distributed in-memory trajectory similarity search are called for. However, existing distributed proposals either suffer from computing resource waste or are unable to support the range of similarity measures that are being used. We propose a distributed in-memory management framework called REPOSE for processing top-k trajectory similarity queries on Spark. We develop a reference point trie (RP-Trie) index to organize trajectory data for local search. In addition, we design a novel heterogeneous global partitioning strategy to eliminate load imbalance in distributed settings. We report on extensive experiments with real-world data that offer insight into the performance of the solution, and show that the solution is capable of outperforming the state-of-the-art proposals.

[1]  Marios Hadjieleftheriou,et al.  Efficient trajectory joins using symbolic representations , 2005, MDM '05.

[2]  ShapiroMarvin The choice of reference points in best-match file searching , 1977 .

[3]  Xing Xie,et al.  T-drive: driving directions based on taxi trajectories , 2010, GIS '10.

[4]  Guoliang Li,et al.  Signature-Based Trajectory Similarity Join , 2017, IEEE Transactions on Knowledge and Data Engineering.

[5]  Marvin B. Shapiro The choice of reference points in best-match file searching , 1977, CACM.

[6]  Mudhakar Srivatsa,et al.  SOM-TC: Self-Organizing Map for Hierarchical Trajectory Clustering , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[7]  Nectaria Tryfona,et al.  Dynamic travel time provision for road networks , 2008, GIS '08.

[8]  Carlo Ratti,et al.  Taxi-Aware Map: Identifying and Predicting Vacant Taxis in the City , 2010, AmI.

[9]  Václav Snásel,et al.  PM-tree: Pivoting Metric Tree for Similarity Search in Multimedia Databases , 2004, ADBIS.

[10]  Yu Zheng,et al.  Travel time estimation of a path using sparse trajectories , 2014, KDD.

[11]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[12]  Kai Zheng,et al.  A survey of trajectory distance measures and performance evaluation , 2019, The VLDB Journal.

[13]  Themis Palpanas,et al.  Massively Distributed Time Series Indexing and Querying , 2020, IEEE Transactions on Knowledge and Data Engineering.

[14]  Francesco Silvestri,et al.  Locality-Sensitive Hashing of Curves , 2017, Symposium on Computational Geometry.

[15]  Xing Xie,et al.  T-Drive: Enhancing Driving Directions with Taxi Drivers' Intelligence , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Marios Hadjieleftheriou,et al.  Time relaxed spatiotemporal trajectory joins , 2005, GIS '05.

[17]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[18]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[19]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[20]  Nectaria Tryfona,et al.  Dynamic Travel Time Maps - Enabling Efficient Navigation , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[21]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[22]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[23]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Zhi-Hua Zhou,et al.  iBAT: detecting anomalous taxi trajectories from GPS traces , 2011, UbiComp '11.

[25]  Yannis Theodoridis,et al.  Index-based Most Similar Trajectory Search , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Helmut Alt,et al.  Computing the Fréchet distance between two polygonal curves , 1995, Int. J. Comput. Geom. Appl..

[27]  Zhifeng Bao,et al.  DITA: Distributed In-Memory Trajectory Analytics , 2018, SIGMOD Conference.

[28]  Dimitrios Gunopulos,et al.  Crowdsourced Trace Similarity with Smartphones , 2013, IEEE Transactions on Knowledge and Data Engineering.

[29]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[30]  Mayank Goswami,et al.  Multi-resolution sketches and locality sensitive hashing for fast trajectory processing , 2018, SIGSPATIAL/GIS.

[31]  Kentaro Uesugi,et al.  Adaptive routing of multiple taxis by mutual exchange of pathways , 2010, Int. J. Knowl. Eng. Soft Data Paradigms.

[32]  Dimitrios Gunopulos,et al.  Indexing multi-dimensional time-series with support for multiple distance measures , 2003, KDD '03.

[33]  Feifei Li,et al.  Distributed Trajectory Similarity Search , 2017, Proc. VLDB Endow..

[34]  Hui Ding,et al.  Efficient Similarity Join of Large Sets of Moving Object Trajectories , 2008, 2008 15th International Symposium on Temporal Representation and Reasoning.

[35]  Hung-Chi Su,et al.  On the Locality Properties of Space-Filling Curves , 2003, ISAAC.

[36]  Viktor Leis,et al.  Succinct Range Filters , 2019, SGMD.

[37]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[38]  Ada Wai-Chee Fu,et al.  Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances , 2000, The VLDB Journal.