Novel Exact and Approximate Algorithms for the Closest Pair Problem

The closest pair problem (CPP) is an important problem that has numerous applications in clustering, graph partitioning, image processing, patterns identification, intrusion detection, etc. Numerous algorithms have been presented for solving the CPP. For instance, on n points there exists an O(n log n) time algorithm for CPP (when the dimension is a constant). There also exist randomized algorithms with an expected linear run time. However these algorithms do not perform well in practice. The algorithms that are employed in practice have a worst case quadratic run time. One of the best performing algorithms for the CPP is MK (originally designed for solving the time series motif finding problem). In this paper we present an elegant exact algorithm called MPR for the CPP that performs better than MK. Also, we present approximation algorithms for the CPP that are faster than MK by up to a factor of more than 40, while maintaining a very good accuracy.

[1]  Andrew Chi-Chih Yao Lower bounds for algebraic computation trees with integer inputs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[2]  Klaus Sutner Probabilistic Algorithms , 2017 .

[3]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[4]  Samir Khuller,et al.  A Simple Randomized Sieve Algorithm for the Closest-Pair Problem , 1995, Inf. Comput..

[5]  Andrew Chi-Chih Yao Lower Bounds for Algebraic Computation Trees with Integer Inputs , 1991, SIAM J. Comput..

[6]  Man Lung Yiu,et al.  Quick-motif: An efficient and scalable framework for exact motif discovery , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[7]  Franco P. Preparata,et al.  Computational Geometry , 1985, Texts and Monographs in Computer Science.

[8]  Eamonn J. Keogh,et al.  Exact Discovery of Time Series Motifs , 2009, SDM.

[9]  John E. Hopcroft,et al.  A Note on Rabin's Nearest-Neighbor Algorithm , 1978, Inf. Process. Lett..

[10]  Eamonn J. Keogh,et al.  Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[11]  A. Paz Probabilistic algorithms , 2003 .