On the complexity of reverse similarity search

Two decision problems are presented that arise from reversing the operation of a distance-based indexing tree. Whereas similarity search finds points in the tree given a query point, reverse similarity search begins with a set of constraints like those defining a leaf and generates a point meeting the constraints. These problems derive from robust hashing, a technique used in similarity search and security applications. The problems are analysed for spaces of strings and vectors with a variety of metrics: strings with Hamming distance; the usual (Levenshtein) edit distance; an edit distance we introduce called Superghost distance; arbitrary weighted tree metrics; and real vectors with Minkowski LP metrics (of which the Euclidean distance is a special case). They are found to inhabit different complexity classes depending on the metric. In particular, the reverse similarity search problem derived from a VP- or GH-tree is NP-complete for any LP metric except that it is in P for a GH-tree with the Euclidean metric.

[1]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[2]  E. Chávez,et al.  Measuring the Dimensionality of General Metric Spaces , 2000 .

[3]  Hector J. Levesque,et al.  Hard and Easy Distributions of SAT Problems , 1992, AAAI.

[4]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[5]  Micha Sharir,et al.  A subexponential bound for linear programming , 1992, SCG '92.

[6]  Santosh S. Vempala,et al.  Locality-preserving hashing in multidimensional spaces , 1997, STOC '97.

[7]  A. Litman,et al.  On covering problems of codes , 1997, Theory of Computing Systems.

[8]  Arun Ross,et al.  From Template to Image: Reconstructing Fingerprints from Minutiae Points , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[10]  N. Memon,et al.  Confusion/Diffusion Capabilities of Some Robust Hash Functions , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[11]  Alan L. Selman,et al.  Complexity Measures for Public-Key Cryptosystems , 1988, SIAM J. Comput..

[12]  Matthew Skala,et al.  Measuring the Difficulty of Distance-Based Indexing , 2005, SPIRE.

[13]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[14]  Ernesto Damiani,et al.  An Open Digest-based Technique for Spam Detection , 2004, PDCS.

[15]  Amit Sahai,et al.  Positive Results and Techniques for Obfuscation , 2004, EUROCRYPT.

[16]  Nimrod Megiddo,et al.  Linear Programming in Linear Time When the Dimension Is Fixed , 1984, JACM.

[17]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..