The NOBH-tree: Improving in-memory metric access methods by using metric hyperplanes with non-overlapping nodes

In order to speed up similarity query evaluation, index structures divide the target dataset into subsets aimed at finding the answer without examining the entire dataset. As the complexity of the data types handled by modern applications keeps growing, searching by similarity becomes increasingly interesting, that makes the Metric Space Theory as the theoretical base to build the structures employed to index complex data. Also, as the main memory capacity grows and the price decreases, increasingly larger databases can be completely indexed in the main-memory. Thus, more and more applications require the data base management systems to quickly build indexes that can take advantage of memory-based indexes. In this paper, we propose a new family of metric access methods, called NOBH-trees that allow a non-overlapping division of the data space, combining Voronoi-shaped with ball-shaped regions to partition the metric space. We show how to query the subspaces divided by the hyperplanes and how the distance from any element to the hyperplane can be evaluated. The results obtained from the experiments show that the new MAM achieves better performance than the existing ones during both the construction and querying phases.

[1]  Mark W. Meckes,et al.  Positive definite metric spaces , 2010, 1012.5863.

[2]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[3]  David Novak,et al.  MESSIF: Metric Similarity Search Implementation Framework , 2007, DELOS.

[4]  Alfredo Cuzzocrea,et al.  Enhancing accuracy and expressive power of range query answers over incomplete spatial databases via a novel reasoning approach , 2011, Data Knowl. Eng..

[5]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[6]  Václav Snásel,et al.  PM-tree: Pivoting Metric Tree for Similarity Search in Multimedia Databases , 2004, ADBIS.

[7]  Jakub Lokoc,et al.  Cut-Region: A Compact Building Block for Hierarchical Metric Indexing , 2012, SISAP.

[8]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[9]  Pavel Zezula,et al.  Indexing Metric Spaces with M-Tree , 1997, SEBD.

[10]  Youki Kadobayashi,et al.  On Tighter Inequalities for Efficient Similarity Search in Metric Spaces , 2008 .

[11]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[12]  Lei Zhang,et al.  A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[14]  Christos Faloutsos,et al.  Fast Indexing and Visualization of Metric Data Sets using Slim-Trees , 2002, IEEE Trans. Knowl. Data Eng..

[15]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[16]  Benjamin Bustos,et al.  D-Cache: Universal Distance Cache for Metric Access Methods , 2012, IEEE Transactions on Knowledge and Data Engineering.

[17]  Jurandy Almeida,et al.  BP-tree: an efficient index for similarity search in high-dimensional metric spaces , 2010, CIKM.

[18]  Z. Meral Özsoyoglu,et al.  Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.

[19]  Iraj Kalantari,et al.  A Data Structure and an Algorithm for the Nearest Point Problem , 1983, IEEE Transactions on Software Engineering.

[20]  I. J. Schoenberg,et al.  Metric spaces and positive definite functions , 1938 .

[21]  Hui Xiong,et al.  Scaling up top-K cosine similarity search , 2011, Data Knowl. Eng..

[22]  Walter A. Burkhard,et al.  Some approaches to best-match file searching , 1973, Commun. ACM.

[23]  Christian Beecks,et al.  Distance based similarity models for content based multimedia retrieval , 2013 .

[24]  David Novak,et al.  Metric Index: An efficient and scalable solution for precise and approximate similarity search , 2011, Inf. Syst..

[25]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[26]  Lei Wang,et al.  Positive Semidefinite Metric Learning with Boosting , 2009, NIPS.

[27]  Hartmut Noltemeier,et al.  Monotonous Bisector* Trees - A Tool for Efficient Partitioning of Complex Scenes of Geometric Objects , 1992, Data Structures and Efficient Algorithms.

[28]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[29]  Hans-Peter Kriegel,et al.  Optimizing All-Nearest-Neighbor Queries with Trigonometric Pruning , 2010, SSDBM.

[30]  Marcos R. Vieira,et al.  DBM-Tree: A Dynamic Metric Access Method Sensitive to Local Density Data , 2010, J. Inf. Data Manag..

[31]  Sharad Mehrotra,et al.  The hybrid tree: an index structure for high dimensional feature spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[32]  Cristina Dutra de Aguiar Ciferri,et al.  Slicing the metric space to provide quick indexing of complex data in the main memory , 2011, Inf. Syst..

[33]  Azriel Rosenfeld,et al.  Corrigendum to "A modified Hausdorff distance between fuzzy sets" , 2002, Inf. Sci..

[34]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[35]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[36]  Jian Pei,et al.  Using high dimensional indexes to support relevance feedback based interactive images retrieval , 2006, VLDB.

[37]  Agma J. M. Traina,et al.  The MM-Tree: A Memory-Based Metric Tree Without Overlap Between Nodes , 2007, ADBIS.

[38]  Gonzalo Navarro,et al.  Fully dynamic metric access methods based on hyperplane partitioning , 2011, Inf. Syst..

[39]  Ernesto Cuadros-Vargas,et al.  DBM*-Tree: an efficient metric access method , 2007, ACM-SE 45.

[40]  Benjamin Bustos,et al.  Adapting metric indexes for searching in multi-metric spaces , 2012, Multimedia Tools and Applications.

[41]  Hanan Samet,et al.  Metric space similarity joins , 2008, TODS.

[42]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[43]  Richard S. Varga,et al.  On Symmetric Ultrametric Matrices , 1993 .

[44]  Christos Faloutsos,et al.  Indexing of Multimedia Data , 1997, Multimedia Databases in Perspective.

[45]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[46]  A. Goshtasby Similarity and Dissimilarity Measures , 2012 .

[47]  Christos Faloutsos,et al.  The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient , 2007, The VLDB Journal.

[48]  Christos Faloutsos,et al.  The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[49]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[50]  Agma J. M. Traina,et al.  Efficient Content-Based Image Retrieval through Metric Histograms , 2003, World Wide Web.

[51]  David Novak,et al.  Large-scale similarity data management with distributed Metric Index , 2012, Inf. Process. Manag..

[52]  Pavel Zezula,et al.  Similarity Search: The Metric Space Approach (Advances in Database Systems) , 2005 .

[53]  David Novak,et al.  Secure Metric-Based Index for Similarity Cloud , 2012, Secure Data Management.