FRESH: Fréchet Similarity with Hashing

This paper studies the r-range search problem for curves under the continuous Frechet distance: given a dataset S of n polygonal curves and a threshold \(r>0\), construct a data structure that, for any query curve q, efficiently returns all entries in S with distance at most r from q. We propose FRESH, an approximate and randomized approach for r-range search, that leverages on a locality sensitive hashing scheme for detecting candidate near neighbors of the query curve, and on a subsequent pruning step based on a cascade of curve simplifications. We experimentally compare FRESH to exact and deterministic solutions, and we show that high performance can be reached by suitably relaxing precision and recall.

[1]  Ioannis Z. Emiris,et al.  Products of Euclidean metrics and applications to proximity questions among curves , 2017, SoCG.

[2]  Lei Chen,et al.  Finding time period-based most frequent path in big trajectory data , 2013, SIGMOD '13.

[3]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[4]  Philip Levis,et al.  Locality-Sensitive Hashing for Earthquake Detection: A Case Study Scaling Data-Driven Science , 2018, Proc. VLDB Endow..

[5]  Mayank Goswami,et al.  Multi-resolution sketches and locality sensitive hashing for fast trajectory processing , 2018, SIGSPATIAL/GIS.

[6]  Joshua Zhexue Huang,et al.  Mining Trajectory Corridors Using Fréchet Distance and Meshing Grids , 2010, PAKDD.

[7]  Tobias Christiani Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search , 2019, SISAP.

[8]  Torben Bach Pedersen,et al.  Time Series Management Systems: A Survey , 2017, IEEE Transactions on Knowledge and Data Engineering.

[9]  Karl Bringmann,et al.  A fast implementation of near neighbors queries for Fréchet distance (GIS Cup) , 2017, SIGSPATIAL/GIS.

[10]  Marvin Künnemann,et al.  Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[11]  Yu Zheng,et al.  Trajectory Data Mining , 2015, ACM Trans. Intell. Syst. Technol..

[12]  Shusaku Tsumoto,et al.  Cluster Analysis of Time-Series Medical Data Based on the Trajectory Representation and Multiscale Comparison Techniques , 2006, Sixth International Conference on Data Mining (ICDM'06).

[13]  Rasmus Pagh,et al.  I/O-Efficient Similarity Join , 2015, ESA.

[14]  Claudio Cobelli,et al.  A New Classification of Diabetic Gait Pattern Based on Cluster Analysis of Biomechanical Data , 2010, Journal of diabetes science and technology.

[15]  Monika Henzinger,et al.  Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[16]  Anuj Karpatne,et al.  Spatio-Temporal Data Mining , 2017, ACM Comput. Surv..

[17]  Haim Kaplan,et al.  Computing the Discrete Fréchet Distance in Subquadratic Time , 2013, SODA.

[18]  Clark Verbrugge,et al.  Clustering Player Paths , 2015, FDG.

[19]  Rasmus Pagh,et al.  Parameter-free Locality Sensitive Hashing for Spherical Range Reporting , 2016, SODA.

[20]  Aristides Gionis,et al.  Correlating financial time series with micro-blogging activity , 2012, WSDM '12.

[21]  Wolfgang Mulzer,et al.  Four Soviets Walk the Dog - with an Application to Alt's Conjecture , 2012, SODA.

[22]  Xing Xie,et al.  T-drive: driving directions based on taxi trajectories , 2010, GIS '10.

[23]  Panos Kalnis,et al.  Personalized trajectory matching in spatial networks , 2014, The VLDB Journal.

[24]  Joachim Gudmundsson,et al.  Fast Fréchet queries , 2013, Comput. Geom..

[25]  Francesco Silvestri,et al.  Locality-Sensitive Hashing of Curves , 2017, Symposium on Computational Geometry.

[26]  Alexandr Andoni,et al.  Efficient algorithms for substring near neighbor problem , 2006, SODA '06.

[27]  Wolfgang Mulzer,et al.  Approximability of the discrete Fréchet distance , 2016, J. Comput. Geom..

[28]  Jan Vahrenhold,et al.  A Filter-and-Refinement-Algorithm for Range Queries Based on the Fréchet Distance (GIS Cup) , 2017, SIGSPATIAL/GIS.

[29]  Pradeep Dubey,et al.  Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing , 2013, Proc. VLDB Endow..

[30]  Chiranjib Bhattacharyya,et al.  Fréchet Distance Based Approach for Searching Online Handwritten Documents , 2007 .

[31]  Surajit Chaudhuri,et al.  A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[32]  Rasmus Pagh,et al.  I/O-Efficient Similarity Join , 2017, Algorithmica.

[33]  Karl Bringmann,et al.  Why Walking the Dog Takes Time: Frechet Distance Has No Strongly Subquadratic Algorithms Unless SETH Fails , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[34]  Rasmus Pagh,et al.  On the Complexity of Inner Product Similarity Join , 2015, PODS.

[35]  Kevin Buchin,et al.  Visual analytics of delays and interaction in movement data , 2017, Int. J. Geogr. Inf. Sci..

[36]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[37]  Marvin Künnemann,et al.  Improved Approximation for Fréchet Distance on c-Packed Curves Matching Conditional Lower Bounds , 2014, Int. J. Comput. Geom. Appl..

[38]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[39]  Martti Penttonen,et al.  A Reliable Randomized Algorithm for the Closest-Pair Problem , 1997, J. Algorithms.

[40]  Kevin Buchin,et al.  Efficient trajectory queries under the Fréchet distance (GIS Cup) , 2017, SIGSPATIAL/GIS.

[41]  Helmut Alt,et al.  Comparison of Distance Measures for Planar Curves , 2003, Algorithmica.

[42]  Binhai Zhu,et al.  Protein Chain Pair Simplification under the Discrete Fréchet Distance , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[44]  Ryan Williams,et al.  A new algorithm for optimal 2-constraint satisfaction and its implications , 2005, Theor. Comput. Sci..

[45]  Alexandr Andoni,et al.  Approximate Nearest Neighbor Search in High Dimensions , 2018, Proceedings of the International Congress of Mathematicians (ICM 2018).

[46]  Helmut Alt,et al.  Computing the Fréchet distance between two polygonal curves , 1995, Int. J. Comput. Geom. Appl..

[47]  Anne Driemel,et al.  On the complexity of range searching among curves , 2017, SODA.

[48]  Joachim Gudmundsson,et al.  Spatio-Temporal Analysis of Team Sports , 2016, ACM Comput. Surv..

[49]  Sariel Har-Peled,et al.  Approximating the Fréchet Distance for Realistic Curves in Near Linear Time , 2012, Discret. Comput. Geom..

[50]  Piotr Indyk,et al.  Approximate nearest neighbor algorithms for Frechet distance via product metrics , 2002, SCG '02.

[51]  Dev Oliver,et al.  ACM SIGSPATIAL GIS Cup 2017: range queries under Fréchet distance , 2018, SIGSPACIAL.

[52]  Matt Duckham,et al.  Trajectory similarity measures , 2015, SIGSPACIAL.

[53]  H. Mannila,et al.  Computing Discrete Fréchet Distance ∗ , 1994 .

[54]  Wolfgang Mulzer,et al.  APPROXIMABILITY OF THE DISCRETE FRÉCHET , 2016 .