A Novel High-Dimensional Index Method Based on the Mathematical Features

Nowadays the nearest neighbor (NN) search in the high dimensional space can be applied in many fields and it becomes the focus of information science. Usually, R-near neighbor that sets a fixed query range R is used in place of NN search. However, the traditional methods for R-near neighbor can not achieve the satisfactory performance in the high dimensional space due to the curse of dimensionality. Moreover, some methods is based on probabilistic guarantees so it does not provide the 100 % accuracy guarantee. To improve the problem, in this paper, we propose a novel idea to build the index structure. This method is based on the mathematical features of the coordinates of the data points. Specifically, we employ the mean value and the standard deviation of the coordinate to index the data point. This method can efficiently solve the R-NN search with the 100 % accuracy guarantee in the high dimensional space. Extensive experimental results demonstrate the effectiveness of the proposed methods.

[1]  Christian Böhm,et al.  A cost model for query processing in high dimensional data spaces , 2000, TODS.

[2]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[3]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[4]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[5]  Panos Kalnis,et al.  Quality and efficiency in high dimensional nearest neighbor search , 2009, SIGMOD Conference.

[6]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[7]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[8]  Yi Yang,et al.  Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[9]  Beng Chin Ooi,et al.  DSH: data sensitive hashing for high-dimensional k-nnsearch , 2014, SIGMOD Conference.

[10]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[11]  Peter J. H. King,et al.  Using Space-Filling Curves for Multi-dimensional Indexing , 2000, BNCOD.

[12]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[13]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.