Authenticated Multi-Step Nearest Neighbor Search

Multi-step processing is commonly used for nearest neighbor (NN) and similarity search in applications involving high-dimensional data and/or costly distance computations. Today, many such applications require a proof of result correctness. In this setting, clients issue NN queries to a server that maintains a database signed by a trusted authority. The server returns the NN set along with supplementary information that permits result verification using the dataset signature. Unfortunately, an adaptation of the multi-step NN algorithm incurs prohibitive network overhead due to the transmission of false hits, i.e., records that are not in the NN set, but are nevertheless necessary for its verification. In order to alleviate this problem, we present a novel technique that reduces the size of each false hit. Moreover, we generalize our solution for a distributed setting, where the database is horizontally partitioned over several servers. Finally, we demonstrate the effectiveness of the proposed solutions with real datasets of various dimensionalities.

[1]  Christos Faloutsos,et al.  Fast Nearest Neighbor Search in Medical Image Databases , 1996, VLDB.

[2]  R Nick Bryan The digital rEvolution: the millennial change in medical imaging. , 2003, Radiology.

[3]  Feifei Li,et al.  Dynamic authenticated index structures for outsourced databases , 2006, SIGMOD Conference.

[4]  Panos Kalnis,et al.  Quality and efficiency in high dimensional nearest neighbor search , 2009, SIGMOD Conference.

[5]  Joachim Posegga,et al.  On Structural Signatures for Tree Data Structures , 2012, ACNS.

[6]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[7]  Christian S. Collberg,et al.  Tamper Detection in Audit Logs , 2004, VLDB.

[8]  Kyriakos Mouratidis,et al.  Authenticating the query results of text search engines , 2008, Proc. VLDB Endow..

[9]  Yin Yang,et al.  Spatial Outsourcing for Location-based Services , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[11]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[12]  Roberto Tamassia,et al.  Efficient Content Authentication over Distributed Hash Tables , 2006 .

[13]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[14]  Raymond T. Ng,et al.  Indexing spatio-temporal trajectories with Chebyshev polynomials , 2004, SIGMOD '04.

[15]  Michael Gertz,et al.  A General Model for Authenticated Data Structures , 2004, Algorithmica.

[16]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[17]  Yufei Tao,et al.  Query Processing in Spatial Network Databases , 2003, VLDB.

[18]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[19]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[20]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[21]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[22]  Ralph C. Merkle,et al.  A Certified Digital Signature , 1989, CRYPTO.

[23]  Feifei Li,et al.  Randomized Synopses for Query Assurance on Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.