Data structures and algorithms for nearest neighbor search in general metric spaces

We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation 1s very high. Also relevant are high-dimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The up-tree (vantage point tree) is introduced in several forms, together‘ with &&ciated algorithms, as an improved method for these difficult search nroblems. Tree construcI tion executes in O(nlog(n i ) time, and search is under certain circumstances and in the imit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kd-tree performance is compared.

[1]  Walter A. Burkhard,et al.  Some approaches to best-match file searching , 1973, Commun. ACM.

[2]  Ronald L. Rivest,et al.  On the Optimality of Elia's Algorithm for Performing Best-Match Searches , 1974, IFIP Congress.

[3]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[4]  Forest Baskett,et al.  An Algorithm for Finding Nearest Neighbors , 1975, IEEE Transactions on Computers.

[5]  Thomas P. Yunck,et al.  A Technique to Identify Nearest Neighbors , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Richard J. Lipton,et al.  Multidimensional Searching Problems , 1976, SIAM J. Comput..

[7]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[8]  Gerard Salton,et al.  Generation and search of clustered files , 1978, TODS.

[9]  Jon Louis Bentley,et al.  Data Structures for Range Searching , 1979, CSUR.

[10]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[11]  Caroline M. Eastman,et al.  Tree structures for high dimensionality nearest neighbor searching , 1982, Inf. Syst..

[12]  Linda G. Shapiro,et al.  The nearest neighbor problem in an abstract metric space , 1982, Pattern Recognit. Lett..

[13]  Song B. Park,et al.  A Fast k Nearest Neighbor Finding Algorithm Based on the Ordered Partition , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Kenneth L. Clarkson,et al.  New applications of random sampling in computational geometry , 1987, Discret. Comput. Geom..

[15]  Kenneth L. Clarkson,et al.  A Randomized Algorithm for Closest-Point Queries , 1988, SIAM J. Comput..

[16]  Alan J. Broder Strategies for efficient incremental nearest neighbor search , 1990, Pattern Recognit..

[17]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[18]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[19]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[20]  Peter Yianilos,et al.  Normalized Forms for Two Common Metrics , 1991 .

[21]  Alan M. Frieze,et al.  Separator based parallel divide and conquer in computational geometry , 1992, SPAA '92.