The Closest Pair Problem under the Hamming Metric

Finding the closest pair among a given set of points under Hamming Metric is a fundamental problem with many applications. Let n be the number of points and D the dimensionality of all points. We show that for 0 < D ≤ n 0.294, the problem, with the binary alphabet set, can be solved within time complexity $O\left(n^{2+o(1)}\right)$, whereas for n 0.294 < D ≤ n , it can be solved within time complexity $O\left(n^{1.843} D^{0.533}\right)$. We also provide an alternative approach not involving algebraic matrix multiplication, which has the time complexity $O\left(n^2D/\log^2 D\right)$ with small constant, and is effective for practical use. Moreover, for arbitrary large alphabet set, an algorithm with the time complexity $O\left(n^2\sqrt{D}\right)$ is obtained for 0 < D ≤ n 0.294, whereas the time complexity is $O\left(n^{1.921} D^{0.767}\right)$ for n 0.294 < D ≤ n . In addition, the algorithms propose in this paper provides a solution to the open problem stated by Kao et al.

[1]  F. MacWilliams,et al.  The Theory of Error-Correcting Codes , 1977 .

[2]  Moshe Lewenstein,et al.  Closest Pair Problems in Very High Dimensions , 2004, ICALP.

[3]  R. Motwani,et al.  On Diameter Verification and Boolean Matrix Multiplication. , 1995 .

[4]  Geoffrey Zweig,et al.  The bit vector intersection problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[5]  Victor Y. Pan,et al.  Fast Rectangular Matrix Multiplication and Applications , 1998, J. Complex..

[6]  F. Frances Yao,et al.  Multi-index hashing for information retrieval , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[7]  Wojciech Rytter,et al.  Fast Recognition of Pushdown Automaton and Context-free Languages , 1986, Inf. Control..

[8]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[9]  Ron M. Roth,et al.  Bounds for Binary Codes With Narrow Distance Distributions , 2007, IEEE Transactions on Information Theory.

[10]  Ming-Yang Kao,et al.  Randomized Fast Design of Short DNA Words , 2005, ICALP.

[11]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[12]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[13]  Don Coppersmith,et al.  Rectangular Matrix Multiplication Revisited , 1997, J. Complex..

[14]  Yannis Manolopoulos,et al.  C2P: Clustering based on Closest Pairs , 2001, VLDB.

[15]  Nicola Santoro,et al.  A Practical Algorithm for Boolean Matrix Multiplication , 1988, Inf. Process. Lett..

[16]  Victor S. Miller,et al.  Optimal hash functions for approximate closest pairs on the n-cube , 2008, ArXiv.

[17]  Ely Porat,et al.  L1 pattern matching lower bound , 2008, Inf. Process. Lett..