Speed up linear scan in high-dimensions by sorting one-dimensional projections

High-dimensional indexing is a pervasive challenge faced in multimedia retrieval. Existing indexing methods applying linear scan strategy, such as VA-file and its variations, are still efficient when the dimensionality is high. In this paper, we propose a new access idea implemented on linear scan based methods to speed up the nearest-neighbor queries. The idea is to map high- dimensional points into two kinds of one-dimensional values using projection and distance computation. The projection values on the line determined by the first Principal Component are sorted and indexed using a B + -tree, and the distances of each point to a reference point are also embedded into leaf node of the B + -tree. When performing nearest neighbor search, the Partial Distortion Searching and triangular inequality are employed to prune search space. In the new search algorithm, only a small portion of data points need to be linearly accessed by computing the bounded distance on the one-dimensional line, which can reduce the I/O and processor time dramatically. Experiment results on large image databases show that the new access method provides a faster search speed than existing high-dimensional index methods.

[1]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[2]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[3]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[4]  Thomas S. Huang,et al.  Supporting similarity queries in MARS , 1997, MULTIMEDIA '97.

[5]  Ambuj K. Singh,et al.  Dimensionality Reduction for Similarity Searching in Dynamic Databases , 1999, Comput. Vis. Image Underst..

[6]  Aoying Zhou,et al.  An adaptive and dynamic dimensionality reduction method for high-dimensional indexing , 2007, The VLDB Journal.

[7]  Beng Chin Ooi,et al.  Towards effective indexing for very large video sequence database , 2005, SIGMOD '05.

[8]  Hans-Peter Kriegel,et al.  The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.

[9]  Jiangtao Cui,et al.  Efficient high-dimensional indexing by sorting principal component , 2007, Pattern Recognit. Lett..

[10]  Chin-Wan Chung,et al.  The GC-tree: a high-dimensional index structure for similarity search in image databases , 2002, IEEE Trans. Multim..

[11]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[12]  Divyakant Agrawal,et al.  High dimensional nearest neighbor searching , 2006, Inf. Syst..

[13]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[14]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Sharad Mehrotra,et al.  Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces , 2000, VLDB.

[16]  Xiang Lian,et al.  A General Cost Model for Dimensionality Reduction in High Dimensional Spaces , 2007, 2007 IEEE 23rd International Conference on Data Engineering.