On Efficient Distance-Based Similarity Search

In this paper, we address two sub-problems within the broad topic of similarity search, focusing on the enhancement of search efficiency based on their common clue ``distance''. One is the fundamental query type, k-nearest neighbor ($k$-NN) and range queries that are regarding distance comparison in terms of nearness. The other is a relatively special query type, reverse furthest neighbor (RFN) query, oppositely considering the distance in terms of farness. To the former, we propose an original index scheme, ``function index'', to index expensive distance functions for efficient query processing in multi-dimensional (even high-dimensional) space. Escaping from the traditional indexing ideas such as space or data partition, we are the first to novelly consider indexing the distance functions. Regarding the latter, it was lack of attention in the past decades although it is a valuable and applicable query type to solve real problems. Thus we concentrate on theoretical analysis and algorithm design to enhance the query efficiency. Extensive experimental evaluations on both synthetic and real datasets are conducted to confirm the efficiency of our approaches by comparing with the state-of-the-art methods.