Efficient Approximate Indexing in High-Dimensional Feature Spaces

In this paper we present a fast approximate indexing method for high dimensional feature space that uses the error probability as an independent variable. The idea of the algorithm is to define a low-dimensional feature space in which a significant portion of the inter-distance variance is concentrated, to search for the nearest neighborhood of the query in this space, and then to extend the search by a factor i¾? to include a number of objects "near" this nearest neighborhood. We shall show that, under reasonable hypotheses on the distribution of items in the feature space, it is possible to derive a relation between the value i¾? and the error probability. We study the error probability and the complexity of the algorithm, validate the model using a data set of images, and show how the results can be used to design indexing schemes.