Ptolemaic indexing

This paper discusses a new family of bounds for use in similarity search, related to those used in metric indexing, but based on Ptolemy’s inequality, rather than the metric axioms. Ptolemy’s inequality holds for the well-known Euclidean distance, but is also shown here to hold for quadratic form metrics in general. In addition, the square root of any metric is Ptolemaic, which means that the principles introduced in this paper have a very wide applicability. The inequality is examined empirically on both synthetic and real-world data sets and is also found to hold approximately, with a very low degree of error, for important distances such as the angular pseudometric and several Lp norms. Indexing experiments are performed on several data sets, demonstrating a highly increased filtering power when using certain forms of Ptolemaic filtering, compared to existing, triangular methods. It is also shown that combining the Ptolemaic and triangular filtering can lead to better results than using either approach on its own.

[1]  J. Paul Robinson,et al.  Quadratic form: A robust metric for quantitative comparison of flow cytometric histograms , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[2]  Sumio Masuda,et al.  Improvements of TLAESA nearest neighbour search algorithm and extension to approximation search , 2006, ACSC.

[3]  Gonzalo Navarro,et al.  On the Least Cost for Proximity Searching in Metric Spaces , 2006, WEA.

[4]  Enrique Vidal-Ruiz,et al.  An algorithm for finding nearest neighbours in (approximately) constant average time , 1986, Pattern Recognit. Lett..

[5]  Ada Wai-Chee Fu,et al.  Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances , 2000, The VLDB Journal.

[6]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[7]  Luisa Micó,et al.  A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements , 1994, Pattern Recognit. Lett..

[8]  Nieves R. Brisaboa,et al.  Clustering-Based Similarity Search in Metric Spaces with Sparse Spatial Centers , 2008, SOFSEM.

[9]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[10]  I. J. Schoenberg A remark on M. M. Day’s characterization of inner-product spaces and a conjecture of L. M. Blumenthal , 1952 .

[11]  Linda G. Shapiro,et al.  The nearest neighbor problem in an abstract metric space , 1982, Pattern Recognit. Lett..

[12]  I. J. Schoenberg On Metric Arcs of Vanishing Menger Curvature , 1940 .

[13]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[14]  Pavel Zezula,et al.  D-Index: Distance Searching Index for Metric Data Sets , 2003, Multimedia Tools and Applications.

[15]  G. Navarro,et al.  Fully dynamic and memory-adaptative spatial approximation trees , 2003 .

[16]  Tomás Skopal,et al.  Unified framework for fast exact and approximate search in dissimilarity spaces , 2007, TODS.

[17]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[18]  Tomás Skopal,et al.  Towards efficient indexing of arbitrary similarity: vision paper , 2013, SGMD.

[19]  Luisa Micó,et al.  A fast branch & bound nearest neighbour classifier in metric spaces , 1996, Pattern Recognit. Lett..

[20]  Tomás Skopal,et al.  Designing Similarity Indexes with Parallel Genetic Programming , 2013, SISAP.

[21]  Vlastislav Dohnal,et al.  An Access Structure for Similarity Search in Metric Spaces , 2004, EDBT Workshops.

[22]  Yannis Manolopoulos,et al.  R-Trees: Theory and Applications , 2005, Advanced Information and Knowledge Processing.

[23]  Tomás Skopal,et al.  Pivoting M-tree: A Metric Access Method for Efficient Similarity Search , 2004, DATESO.

[24]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[25]  John D. Smith Generalization of the triangle and Ptolemy inequalities , 1994 .

[26]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[27]  Magnus Lie Hetland A Survey of Recent Methods for Efficient Retrieval of Similar Time Sequences , 2001 .

[28]  W. A. Wilson On Certain Types of Continuous Transformations of Metric Spaces , 1935 .

[29]  Elena Deza,et al.  Dictionary of distances , 2006 .

[30]  Jakub Lokoc,et al.  Ptolemaic indexing of the signature quadratic form distance , 2011, SISAP.

[31]  Magnus Lie Hetland The Basic Principles of Metric Indexing , 2009 .

[32]  Yannis Manolopoulos,et al.  R-Trees: Theory and Applications (Advanced Information and Knowledge Processing) , 2005 .

[33]  Thomas Seidl,et al.  Signature Quadratic Form Distance , 2010, CIVR '10.

[34]  Jakub Lokoc,et al.  Ptolemaic access methods: Challenging the reign of the metric space model , 2013, Inf. Syst..

[35]  Jakub Lokoc,et al.  On (not) indexing quadratic form distance by metric access methods , 2011, EDBT/ICDT '11.

[36]  Nieves R. Brisaboa,et al.  Spatial Selection of Sparse Pivots for Similarity Search in Metric Spaces , 2007, SOFSEM.

[37]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[38]  Kimmo Fredriksson,et al.  Engineering efficient metric indexes , 2007, Pattern Recognit. Lett..

[39]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  V. Schroeder,et al.  Hyperbolicity, CAT(−1)-spaces and the Ptolemy inequality , 2006, math/0605418.