Storing data once in M-trees and PM-trees: Revisiting the building principles of metric access methods

Since the introduction of the M-tree, a fundamental tree-based data structure for indexing multi-dimensional information, several structural enhancements have been proposed. One of the most effective ones is the use of additional global pivots that resulted in the PM-tree. These two indexing structures, however, can store the same data element in multiple nodes. In this article, we revisit both the M-tree and the PM-tree to propose a new construction algorithm that stores data elements only once in the tree hierarchies. The main challenge to accomplish this, is to properly select data elements when an inner node split is needed. To address it, we propose an approach based on the use of aggregate nearest neighbor queries. The new algorithms enable building the search result set as data elements are evaluated for pruning during traversal, allowing faster retrieval of k-nearest neighbors and range searches. We conducted an extensive set of experiments with different real datasets. The results show that that our proposed algorithms have considerably superior performance when compared with the standard M-tree and PM-tree. This work has been supported by the Coordenação de Aperfeiçoamento de Pessoal de Nı́vel Superior Brasil (CAPES) – Finance Code 001, and by the Brazilian National Council for Scientific and Technological Development (CNPq). Preprint submitted to Information Systems September 3, 2021 Manuscript File Click here to view linked References

[1]  Gonzalo Navarro,et al.  Metric Spaces Library , 2008 .

[2]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[3]  Maria Camila Nardini Barioni,et al.  Similarity search through one-dimensional embeddings , 2017, SAC.

[4]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[5]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[6]  Christos Faloutsos,et al.  Fast feature selection using fractal dimension , 2010, J. Inf. Data Manag..

[7]  Jakub Lokoc,et al.  On indexing metric spaces using cut-regions , 2014, Inf. Syst..

[8]  Christos Faloutsos,et al.  A novel optimization approach to efficiently process aggregate similarity queries in metric access methods , 2008, CIKM '08.

[9]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[10]  Andrea Esuli,et al.  CoPhIR: a Test Collection for Content-Based Image Retrieval , 2009, ArXiv.

[11]  David Novak,et al.  Metric Index: An efficient and scalable solution for precise and approximate similarity search , 2011, Inf. Syst..

[12]  Yunjun Gao,et al.  Pivot-based Metric Indexing , 2017, Proc. VLDB Endow..

[13]  Gonzalo Navarro,et al.  New dynamic metric indices for secondary memory , 2016, Inf. Syst..

[14]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[15]  Christos Faloutsos,et al.  The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient , 2007, The VLDB Journal.

[16]  Maria Camila Nardini Barioni,et al.  Metric Indexing Assisted by Short-Term Memories , 2018, SISAP.

[17]  Gonzalo Navarro,et al.  An empirical evaluation of intrinsic dimension estimators , 2015, Inf. Syst..

[18]  Jakub Lokoc,et al.  Cut-Region: A Compact Building Block for Hierarchical Metric Indexing , 2012, SISAP.

[19]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[20]  Christos Faloutsos,et al.  Fast Indexing and Visualization of Metric Data Sets using Slim-Trees , 2002, IEEE Trans. Knowl. Data Eng..

[21]  Maria Camila Nardini Barioni,et al.  Storing Data Once in M-tree and PM-tree , 2019, SISAP.

[22]  Václav Snásel,et al.  Nearest Neighbours Search Using the PM-Tree , 2005, DASFAA.

[23]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[24]  Jakub Lokoc,et al.  On reinsertions in M-tree , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[25]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[26]  Luisa Micó,et al.  A fast pivot-based indexing algorithm for metric spaces , 2011, Pattern Recognit. Lett..

[27]  Václav Snásel,et al.  PM-tree: Pivoting Metric Tree for Similarity Search in Multimedia Databases , 2004, ADBIS.