Re-ranking Permutation-Based Candidate Sets with the n-Simplex Projection

In the realm of metric search, the permutation-based approaches have shown very good performance in indexing and supporting approximate search on large databases. These methods embed the metric objects into a permutation space where candidate results to a given query can be efficiently identified. Typically, to achieve high effectiveness, the permutation-based result set is refined by directly comparing each candidate object to the query one. Therefore, one drawback of these approaches is that the original dataset needs to be stored and then accessed during the refining step. We propose a refining approach based on a metric embedding, called n-Simplex projection, that can be used on metric spaces meeting the n-point property. The n-Simplex projection provides upper- and lower-bounds of the actual distance, derived using the distances between the data objects and a finite set of pivots. We propose to reuse the distances computed for building the data permutations to derive these bounds and we show how to use them to improve the permutation-based results. Our approach is particularly advantageous for all the cases in which the traditional refining step is too costly, e.g. very large dataset or very expensive metric function.

[1]  Leonard M. Blumenthal,et al.  Theory and applications of distance geometry , 1954 .

[2]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[3]  Victor Lempitsky,et al.  The inverted multi-index , 2012, CVPR.

[4]  Richard C. H. Connor,et al.  Supermetric Search with the Four-Point Property , 2016, SISAP.

[5]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[6]  Richard C. H. Connor,et al.  Supermetric Search , 2017, Inf. Syst..

[7]  Gonzalo Navarro,et al.  Effective Proximity Retrieval by Ordering Permutations , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[9]  Richard C. H. Connor,et al.  High-Dimensional Simplexes for Supermetric Search , 2017, SISAP.

[10]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[11]  Claudio Gennaro,et al.  Deep Permutations: Deep Convolutional Neural Networks and Permutation-Based Indexing , 2016, SISAP.

[12]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[13]  Richard C. H. Connor,et al.  Hilbert Exclusion , 2016, ACM Trans. Inf. Syst..

[14]  Vladimir Pestov,et al.  Indexability, concentration, and VC theory , 2010, J. Discrete Algorithms.

[15]  David Novak,et al.  PPP-Codes for Large-Scale Similarity Searching , 2016, Trans. Large Scale Data Knowl. Centered Syst..

[16]  Gonzalo Navarro,et al.  Metric Spaces Library , 2008 .

[17]  Fabrizio Falchi,et al.  Some Theoretical and Experimental Observations on Permutation Spaces and Similarity Search , 2014, SISAP.

[18]  Claudio Gennaro,et al.  YFCC100M-HNfc6: A Large-Scale Deep Features Benchmark for Similarity Search , 2016, SISAP.

[19]  Luisa Micó,et al.  A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements , 1994, Pattern Recognit. Lett..

[20]  Claudio Gennaro,et al.  MI-File: using inverted files for scalable approximate similarity search , 2012, Multimedia Tools and Applications.

[21]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[22]  Andrea Esuli,et al.  Use of permutation prefixes for efficient and scalable approximate similarity search , 2012, Inf. Process. Manag..

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Pasquale Savino,et al.  Approximate similarity search in metric spaces using inverted files , 2008, Infoscale.