Data Independent Method of Constructing Distributed LSH for Large-Scale Dynamic High-Dimensional Indexing

Constructing effective and efficient indexes for explosive growing multimedia data is a very challenging problem. To solve the problem, Haghani et al. provide a distributed similarity search method in high dimensions using Locality Sensitive Hashing. However, their method needs to estimate a global parameter on the whole dataset beforehand. It is impractical for a large-scale dynamical dataset. This paper proposes a novel constructing method of distributed LSH which does not need any priori knowledge about the dataset. Through generating the hash function with consistent output distribution, we get a data independent predicting model in theory which can guarantee a well load balance even if the dataset dynamically changes. Furthermore, we modify the query algorithm of the basic LSH to make the proposed model more practical. The experimental results on two open large-scale high-dimensional datasets show that the proposed method is more robust, scalable and practical than state-of-the-art.