Compressing the Index - A Simple and yet Efficient Approximation Approach to High-Dimensional Indexing

An efficient tunable high-dimensional indexing scheme called the iMinMax(θ) was proposed to map high-dimensional data points into single dimension value based on the minimum or maximum values among all dimensions [7]. Unfortunately, the number of leaf nodes needs to be scanned remains large. To reduce the number of leaf nodes, we propose to use the compression technique proposed in the Vector Approximation File (VA-file) [10] to represent vectors. We call the hybrid method, the iMinMax(θ)*. While the marriage is straight forward, the gain in performance is significant. In our extensive performance study, the results clearly indicate that iMinMax(θ)* outperforms the original iMinMax(θ) index scheme and the VA-file. iMinMax(θ)* is also attractive from a practical view point for its implementation cost is only slightly higher than that of the original iMinMax(θ). The approximation concept that is incorporated in iMinMax(θ)* can be integrated in other high-dimensional index structures without much difficulty.