A Reliable Randomized Algorithm for the Closest-Pair Problem

The following two computational problems are studied:Duplicate grouping:Assume thatnitems are given, each of which is labeled by an integer key from the set {0,?,U?1}. Store the items in an array of sizensuch that items with the same key occupy a contiguous segment of the array.Closest pair:Assume that a multiset ofnpoints in thed-dimensional Euclidean space is given, whered?1 is a fixed integer. Each point is represented as ad-tuple of integers in the range {0,?,U?1} (or of arbitrary real numbers). Find a closest pair, i.e., a pair of points whose distance is minimal over all such pairs.In 1976, Rabin described a randomized algorithm for the closest-pair problem that takes linear expected time. As a subroutine, he used a hashing procedure whose implementation was left open. Only years later randomized hashing schemes suitable for filling this gap were developed.In this paper, we return to Rabin's classic algorithm to provide a fully detailed description and analysis, thereby also extending and strengthening his result. As a preliminary step, we study randomized algorithms for the duplicate-grouping problem. In the course of solving the duplicate-grouping problem, we describe a new universal class of hash functions of independent interest.It is shown that both of the foregoing problems can be solved by randomized algorithms that useO(n) space and finish inO(n) time with probability tending to 1 asngrows to infinity. The model of computation is a unit-cost RAM capable of generating random numbers and of performing arithmetic operations from the set {+,?,?,div,log2,exp2}, wheredivdenotes integer division andlog2andexp2are the mappings from N to N?{0} withlog2(m)=?log2m? andexp2(m)=2mfor allm?N. If the operationslog2andexp2are not available, the running time of the algorithms increases by an additive term ofO(loglogU). All numbers manipulated by the algorithms consist ofO(logn+logU) bits.The algorithms for both of the problems exceed the time boundO(n) orO(n+loglogU) with probability 2?n?(1). Variants of the algorithms are also given that use onlyO(logn+logU) random bits and have probabilityO(n??) of exceeding the time bounds, where ??1 is a constant that can be chosen arbitrarily.The algorithms for the closest-pair problem also works if the coordinates of the points are arbitrary real numbers, provided that the RAM is able to perform arithmetic operations from {+,?,?,div} on real numbers, whereadivbnow means ?a/b?. In this case, the running time isO(n) withlog2andexp2andO(n+loglog(?max/?max)) without them, where ?maxis the maximum and ?minis the minimum distance between any two distinct input points.

[1]  W. Sierpinski Elementary Theory of Numbers , 1964 .

[2]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[3]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[4]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[5]  Gary L. Miller,et al.  Riemann's Hypothesis and tests for primality , 1975, STOC.

[6]  Joseph F. Traub,et al.  Algorithms and Complexity: New Directions and Recent Results , 1976 .

[7]  Michael Ian Shamos,et al.  Divide-and-conquer in multidimensional space , 1976, STOC '76.

[8]  John E. Hopcroft,et al.  A Note on Rabin's Nearest-Neighbor Algorithm , 1978, Inf. Process. Lett..

[9]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[10]  M. Rabin Probabilistic algorithm for testing primality , 1980 .

[11]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[12]  Michael Ben-Or,et al.  Lower bounds for algebraic computation trees , 1983, STOC.

[13]  Alfred V. Aho,et al.  Data Structures and Algorithms , 1983 .

[14]  David G. Kirkpatrick,et al.  Upper Bounds for Sorting Integers on Random Access Machines , 1984, Theor. Comput. Sci..

[15]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[16]  Klaus H. Hinrichs,et al.  Plane-Sweep Solves the Closest Pair Problem Elegantly , 1988, Inf. Process. Lett..

[17]  Friedhelm Meyer auf der Heide,et al.  Dynamic perfect hashing: upper and lower bounds , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[18]  Andrew Chi-Chih Yao Lower bounds for algebraic computation trees with integer inputs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[19]  Oded Goldreich,et al.  On the power of two-point based sampling , 1989, J. Complex..

[20]  Noam Nisan,et al.  The computational complexity of universal hashing , 1990, STOC '90.

[21]  Uzi Vishkin,et al.  On Parallel Hashing and Integer Sorting , 1991, J. Algorithms.

[22]  Torben Hagerup,et al.  Fast and reliable parallel hashing , 1991, SPAA '91.

[23]  Wolfgang J. Paul,et al.  Informatik, Festschrift zum 60. Geburtstag von Günter Hotz , 1992, Informatik.

[24]  Alok Aggarwal,et al.  Optimal Time Bounds for Some Proximity Problems in the Plane , 1992, Inf. Process. Lett..

[25]  C. Y. Hsiung,et al.  Elementary theory of numbers , 1992 .

[26]  Friedhelm Meyer auf der Heide,et al.  Dynamic Hashing in Real Time , 1992, Informatik.

[27]  I. Damgård,et al.  Average case error estimates for the strong probable prime test , 1993 .

[28]  Michiel H. M. Smid,et al.  Simple Randomized Algorithms for Closest Pair Problems , 1995, Nord. J. Comput..

[29]  Samir Khuller,et al.  A Simple Randomized Sieve Algorithm for the Closest-Pair Problem , 1995, Inf. Comput..

[30]  Rajeev Raman,et al.  Sorting in linear time? , 1995, STOC '95.

[31]  Rajeev Raman,et al.  Priority Queues: Small, Monotone and Trans-dichotomous , 1996, ESA.

[32]  John Beidler,et al.  Data Structures and Algorithms , 1996, Wiley Encyclopedia of Computer Science and Engineering.

[33]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .