VD-tree: how to build an efficient and fit metric access method using voronoi diagrams

Efficient similarity search is a core issue for retrieval operations on large amounts of complex data, often relying on Metric Access Methods (MAMs) to speed up the Range and k-NN queries. Among the most used MAMs are those based on covering radius, which create balanced structures, and enable efficient data retrieval and dynamic maintenance. MAMs typically suffer from node overlapping, which increases retrieval costs. Some strategies aim to reduce node over-lapping by employing global pivots to improve the filtering process during queries, but result at significant costs to maintain the pivots, whereas not completely removing the overlaps, which impacts queries over large databases. Other strategies use hyper-plane-based MAMs, which can get rid of overlaps but with large costs to create and update the index. We propose VD-Tree, a MAM which combines a covering radius strategy with a Voronoi-like organization. VD-Tree retains index flexibility for updates whereas reducing the node overlap using dynamic swap of elements among nodes. The method relies on only the solid organization fostered by Voronoi, and does not require storing further information to the tree. Experimental analysis using five real-world image datasets and four feature extractors shows that VD-Tree reduced node overlaps up to 43% and the average time needed to answer similarity queries by up to 28%, when compared to its closest competitor.

[1]  Fernando Pereira,et al.  MPEG-7 the generic multimedia content description standard, part 1 - Multimedia, IEEE , 2001 .

[2]  Adeel Anjum,et al.  XM-tree: data driven computational model by using metric extended nodes with non-overlapping in high-dimensional metric spaces , 2019, Comput. Math. Organ. Theory.

[3]  Agma J. M. Traina,et al.  The NOBH-tree: Improving in-memory metric access methods by using metric hyperplanes with non-overlapping nodes , 2014, Data Knowl. Eng..

[4]  David Dagan Feng,et al.  Content-Based Medical Image Retrieval: A Survey of Applications to Multidimensional and Multimodality Data , 2013, Journal of Digital Imaging.

[5]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[6]  Gonzalo Navarro,et al.  Fully dynamic metric access methods based on hyperplane partitioning , 2011, Inf. Syst..

[7]  José Martinez,et al.  A new intersection tree for content-based image retrieval , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[8]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[9]  Vlastislav Dohnal,et al.  BM-index: Balanced Metric Space Index Based on Weighted Voronoi Partitioning , 2019, ADBIS.

[10]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[11]  Christos Faloutsos,et al.  Fast Indexing and Visualization of Metric Data Sets using Slim-Trees , 2002, IEEE Trans. Knowl. Data Eng..

[12]  Thomas Seidl,et al.  Content-based exploration of multimedia databases , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[13]  Ge Yu,et al.  BM+-Tree: A Hyperplane-Based Index Method for High-Dimensional Metric Spaces , 2005, DASFAA.

[14]  Ge Yu,et al.  M+-tree : A New Dynamical Multidimensional Index for Metric Spaces , 2003, ADC.

[15]  Gonzalo Navarro,et al.  An Index Data Structure for Searching in Metric Space Databases , 2006, International Conference on Computational Science.

[16]  Joseph Paul Cohen,et al.  COVID-19 Image Data Collection: Prospective Predictions Are the Future , 2020, ArXiv.