Bitpart: Exact metric search in high(er) dimensions

Abstract We define BitPart (Bitwise representations of binary Partitions), a novel exact search mechanism intended for use in high-dimensional spaces. In outline, a fixed set of reference objects is used to define a large set of regions within the original space, and each data item is characterised according to its containment within these regions. In contrast with other mechanisms only a subset of this information is selected, according to the query, before a search within the re-cast space is performed. Partial data representations are accessed only if they are known to be potentially useful towards the calculation of the exact query solution. Our mechanism requires Ω ( N log N ) space to evaluate a query, where N is the cardinality of the data, and therefore does not scale as well as previously defined mechanisms with low-dimensional data. However it has recently been shown that, for a nearest neighbour search in high dimensions, a sequential scan of the data is essentially unavoidable. This result has been suspected for a long time, and has been referred to as the curse of dimensionality in this context. In the light of this result, the compromise achieved by this work is to make the best possible use of the available fast memory, and to offer great potential for parallel query evaluation. To our knowledge, it gives the best compromise currently known for performing exact search over data whose dimensionality is too high to allow the useful application of metric indexing, yet is still sufficiently low to give at least some traction from the metric and supermetric properties.

[1]  Gonzalo Navarro,et al.  Effective Proximity Retrieval by Ordering Permutations , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Aviad Rubinstein,et al.  Hardness of approximate nearest neighbor search , 2018, STOC.

[3]  Karl Menger,et al.  New Foundation of Euclidean Geometry , 1931 .

[4]  Claudio Gennaro,et al.  MI-File: using inverted files for scalable approximate similarity search , 2012, Multimedia Tools and Applications.

[5]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[6]  Luisa Micó,et al.  A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements , 1994, Pattern Recognit. Lett..

[7]  Eduardo Valle,et al.  Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data , 2014, SISAP.

[8]  Jakub Lokoc,et al.  On applications of parameterized hyperplane partitioning , 2010, SISAP.

[9]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[10]  Richard C. H. Connor,et al.  Supermetric Search , 2017, Inf. Syst..

[11]  Richard C. H. Connor,et al.  Hilbert Exclusion , 2016, ACM Trans. Inf. Syst..

[12]  Richard C. H. Connor,et al.  Querying Metric Spaces with Bit Operations , 2018, SISAP.

[13]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[14]  Ramakrishna Varadarajan,et al.  The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..

[15]  L. M. Blumenthal A note on the four-point property , 1933 .

[16]  Edgar Chávez,et al.  On locality sensitive hashing in metric spaces , 2010, SISAP.

[17]  César A. Astudillo,et al.  Metric Space Searching Based on Random Bisectors and Binary Fingerprints , 2014, SISAP.

[18]  Richard C. H. Connor,et al.  Supermetric Search with the Four-Point Property , 2016, SISAP.

[19]  W. A. Wilson A Relation Between Metric and Euclidean Spaces , 1932 .

[20]  Stéphane Marchand-Maillet,et al.  Quantized ranking for permutation-based indexing , 2013, Inf. Syst..