论文信息 - On the complexity of reverse similarity search

On the complexity of reverse similarity search

Two decision problems are presented that arise from reversing the operation of a distance-based indexing tree. Whereas similarity search finds points in the tree given a query point, reverse similarity search begins with a set of constraints like those defining a leaf and generates a point meeting the constraints. These problems derive from robust hashing, a technique used in similarity search and security applications. The problems are analysed for spaces of strings and vectors with a variety of metrics: strings with Hamming distance; the usual (Levenshtein) edit distance; an edit distance we introduce called Superghost distance; arbitrary weighted tree metrics; and real vectors with Minkowski LP metrics (of which the Euclidean distance is a special case). They are found to inhabit different complexity classes depending on the metric. In particular, the reverse similarity search problem derived from a VP- or GH-tree is NP-complete for any LP metric except that it is in P for a GH-tree with the Euclidean metric.

Matthew Skala | M. Skala

[1] Rafail Ostrovsky,et al. Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[2] E. Chávez,et al. Measuring the Dimensionality of General Metric Spaces , 2000 .

[3] Hector J. Levesque,et al. Hard and Easy Distributions of SAT Problems , 1992, AAAI.

[4] Jeffrey K. Uhlmann,et al. Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[5] Micha Sharir,et al. A subexponential bound for linear programming , 1992, SCG '92.

[6] Santosh S. Vempala,et al. Locality-preserving hashing in multidimensional spaces , 1997, STOC '97.

[7] A. Litman,et al. On covering problems of codes , 1997, Theory of Computing Systems.

[8] Arun Ross,et al. From Template to Image: Reconstructing Fingerprints from Minutiae Points , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Peter N. Yianilos,et al. Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[10] N. Memon,et al. Confusion/Diffusion Capabilities of Some Robust Hash Functions , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[11] Alan L. Selman,et al. Complexity Measures for Public-Key Cryptosystems , 1988, SIAM J. Comput..

[12] Matthew Skala,et al. Measuring the Difficulty of Distance-Based Indexing , 2005, SPIRE.

[13] S. Muthukrishnan,et al. Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[14] Ernesto Damiani,et al. An Open Digest-based Technique for Spam Detection , 2004, PDCS.

[15] Amit Sahai,et al. Positive Results and Techniques for Obfuscation , 2004, EUROCRYPT.

[16] Nimrod Megiddo,et al. Linear Programming in Linear Time When the Dimension Is Fixed , 1984, JACM.

[17] Narendra Karmarkar,et al. A new polynomial-time algorithm for linear programming , 1984, Comb..