BlockB-Tree: A new index structure combined compact B+-tree with block distance

To overcome the "curse of dimensionality", the high-efficient index structures which map the high-dimensional data to single dimension values are proposed. None of these index structures can support the use of block distance for similarity search directly. Block distance is one of the widely used similarity measurement algorithms in CBIR, and it is very simple and has excellent query performance. In this paper, these two algorithms are effectively combined, and the BlockB-Tree is proposed. The BlockB-Tree uses the block distance to map the high-dimensional feature data to single dimension key values, and then uses the compact B+-tree to manage these key values. It can not only directly support the use of block distance for similarity search, but also can effectively support the use of Euclidean distance for similarity search.

[1]  Wu Chun Survey on Semantic-Based Organization and Search Technologies for Network Big Data , 2015 .

[2]  H. Sagan Space-filling curves , 1994 .

[3]  Zhang Hai-qin Clustering Pyramid-Tree: A New Index Structure for High-Dimensional Data Spaces , 2001 .

[4]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[5]  Beng Chin Ooi,et al.  Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.

[6]  Jiangtao Cui,et al.  Speed up Linear Scan in High-Dimensions Using Extended B+-Tree , 2010, 2010 2nd International Workshop on Database Technology and Applications.

[7]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[8]  Beng Chin Ooi,et al.  Indexing the edges—a simple and yet efficient approach to high-dimensional indexing , 2000, PODS.

[9]  Zhang Jun Cluster Splitting Based High Dimensional Metric Space Index B~+-Tree , 2008 .

[10]  Hung Yi Lin A compact index structure with high data retrieval efficiency , 2008, 2008 International Conference on Service Systems and Service Management.

[11]  Wei Zeng,et al.  Topology dependent space filling curves for sensor networks and applications , 2013, 2013 Proceedings IEEE INFOCOM.

[12]  Hans-Peter Kriegel,et al.  The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.

[13]  Joaquim A. Jorge,et al.  Indexing high-dimensional data for content-based retrieval in large databases , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[14]  Beng Chin Ooi,et al.  Making the pyramid technique robust to query types and workloads , 2004, Proceedings. 20th International Conference on Data Engineering.

[15]  Jun-Qi Zhang,et al.  Cluster Splitting Based High Dimensional Metric Space Index B + -Tree: Cluster Splitting Based High Dimensional Metric Space Index B + -Tree , 2008 .

[16]  Hui Lu,et al.  A novel long-term learning algorithm for relevance feedback in content-based image retrieval , 2013, Telecommun. Syst..

[17]  Rémy Mullot,et al.  Mapping high dimensional features onto Hilbert curve: Applying to fast image retrieval , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).