A survey on graph-based methods for similarity searches in metric spaces

Abstract Technology development has accelerated the volume growth of complex data, such as images, videos, time series, and georeferenced data. Similarity search is a widely used approach to retrieve complex data, which aims at retrieving similar data according to intrinsic characteristics of the data. Therefore, to facilitate the retrieval of complex data using similarity searches, one needs to organize large collections of data in a way that similar data can be retrieved efficiently. Many access methods were proposed in the literature to speed up similarity data retrieval from large databases. Recently, graph-based methods have emerged as a very efficient alternative for similarity retrieval, with reports indicating those methods outperformed other non-graph-based methods in several scenarios. However, to the best of our knowledge, there is no previous work with experimental analysis on a comprehensive number of graph-based methods using the same search algorithm and execution environment. Our main contribution is a survey on graph-based methods used for similarity searches. We present a review on graph-based methods (types of graphs and search algorithms) as well as a detailed discussion on the applicability of search algorithms (with exact or approximate answers) in each graph type. Our main focus is on static methods in metric spaces. This survey also includes an experimental evaluation of representative graphs implemented in a common platform. We evaluate the relative performance behavior of these graphs concerning the main construction and query parameters for a variety of real-world datasets. We also show results using synthetic datasets evaluating the performance of different graph types according to different dataset features. Our experimental results reinforce the tradeoff between graph construction cost and search performance according to the construction and search parameters.

[1]  Andrzej Lingas,et al.  A Linear-time Construction of the Relative Neighborhood Graph From the Delaunay Triangulation , 1994, Comput. Geom..

[2]  Vladimir Krylov,et al.  Scalable Distributed Algorithm for Approximate Nearest Neighbor Search Problem in High Dimensional General Metric Spaces , 2012, SISAP.

[3]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[4]  Lazaros T. Tsochatzidis,et al.  Computer-aided diagnosis of mammographic masses based on a supervised content-based image retrieval approach , 2017, Pattern Recognit..

[5]  Leonid Boytsov,et al.  Engineering Efficient and Effective Non-metric Space Library , 2013, SISAP.

[6]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[7]  Ernesto Cuadros-Vargas,et al.  A new approach for similarity queries using neighborhood graphs , 2007, SBBD.

[8]  Atsunori Ogawa,et al.  Graph index based query-by-example search on a large speech data set , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Steven Fortune,et al.  Voronoi Diagrams and Delaunay Triangulations , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[10]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[11]  Thomas S. Huang,et al.  Supporting Ranked Boolean Similarity Queries in MARS , 1998, IEEE Trans. Knowl. Data Eng..

[12]  Leonid Boytsov,et al.  Permutation Search Methods are Efficient, Yet Faster Search is Possible , 2015, Proc. VLDB Endow..

[13]  Hong Zhao,et al.  Texture Feature Analysis for Computer-Aided Diagnosis on Pulmonary Nodules , 2015, Journal of Digital Imaging.

[14]  Kibeom Lee,et al.  Escaping your comfort zone: A graph-based recommender system for finding novel recommendations among relevant items , 2015, Expert Syst. Appl..

[15]  Kenneth J. Supowit,et al.  The Relative Neighborhood Graph, with an Application to Minimum Spanning Trees , 1983, JACM.

[16]  Philip S. Yu,et al.  Feature-based similarity search in graph structures , 2006, TODS.

[17]  D. F. Watson Computing the n-Dimensional Delaunay Tesselation with Application to Voronoi Polytopes , 1981, Comput. J..

[18]  Godfried T. Toussaint,et al.  Relative neighborhood graphs and their relatives , 1992, Proc. IEEE.

[19]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[20]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[21]  Yury A. Malkov,et al.  Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Adrian Bowyer,et al.  Computing Dirichlet Tessellations , 1981, Comput. J..

[23]  Marcos R. Vieira,et al.  Performance Analysis of Graph-Based Methods for Exact and Approximate Similarity Search in Metric Spaces , 2018, SISAP.

[24]  Naonori Ueda,et al.  Fast approximate similarity search based on degree-reduced neighborhood graphs , 2011, KDD.

[25]  Claudio Gennaro,et al.  MI-File: using inverted files for scalable approximate similarity search , 2012, Multimedia Tools and Applications.

[26]  Dennis Shasha,et al.  GraphGrep: A fast and universal method for querying graphs , 2002, Object recognition supported by user interaction for service robots.

[27]  Edgar Chávez,et al.  Using the k-Nearest Neighbor Graph for Proximity Searching in Metric Spaces , 2005, SPIRE.

[28]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Marcos R. Vieira,et al.  DBM-Tree: A Dynamic Metric Access Method Sensitive to Local Density Data , 2010, J. Inf. Data Manag..

[30]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[31]  Gonzalo Navarro,et al.  Practical Construction of k-Nearest Neighbor Graphs in Metric Spaces , 2006, WEA.

[32]  Yasin Abbasi-Yadkori,et al.  Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph , 2011, IJCAI.

[33]  Nora Reyes,et al.  Faster proximity searching with the distal SAT , 2014, Inf. Syst..

[34]  Prosenjit Bose,et al.  PROXIMITY GRAPHS: E, δ, Δ, χ AND ω , 2012, Int. J. Comput. Geom. Appl..

[35]  Salvatore Tabbone,et al.  Hypergraph-based image retrieval for graph-based representation , 2012, Pattern Recognit..

[36]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[37]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[38]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[39]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[40]  Atsunori Ogawa,et al.  Zero-resource spoken term detection using hierarchical graph-based similarity search , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Gonzalo Navarro,et al.  Dynamic spatial approximation trees , 2008, JEAL.

[42]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[43]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[44]  Gonzalo Navarro,et al.  Dynamic Spatial Approximation Trees for Massive Data , 2009, 2009 Second International Workshop on Similarity Search and Applications.

[45]  Benjamin Bustos,et al.  On nonmetric similarity search problems in complex domains , 2011, CSUR.

[46]  Vladimir Krylov,et al.  Approximate nearest neighbor algorithm based on navigable small world graphs , 2014, Inf. Syst..

[47]  Charles L. Lawson,et al.  Properties of n-dimensional triangulations , 1986, Comput. Aided Geom. Des..

[48]  Omar U. Florez,et al.  HRG: A Graph Structure for Fast Similarity Search in Metric Spaces , 2008, DEXA.

[49]  Christos Faloutsos,et al.  Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes , 2000, EDBT.

[50]  Pavel Zezula,et al.  Region proximity in metric spaces and its use for approximate similarity search , 2003, TOIS.

[51]  Gonzalo Navarro Searching in metric spaces by spatial approximation , 2002, The VLDB Journal.

[52]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[53]  LeeKibeom,et al.  Escaping your comfort zone , 2015 .

[54]  Andrea Esuli,et al.  A comparison of pivot selection techniques for permutation-based indexing , 2015, Inf. Syst..

[55]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[56]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[57]  Yousef Saad,et al.  Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[58]  Naonori Ueda,et al.  Fast Similarity Search in Small-World Networks , 2009, CompleNet.

[59]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[60]  Hakim Hacid,et al.  Neighborhood graphs for indexing and retrieving multi-dimensional data , 2009, Journal of Intelligent Information Systems.

[61]  Daphna Weinshall,et al.  Classification in Non-Metric Spaces , 1998, NIPS.

[62]  Andrea Esuli,et al.  Use of permutation prefixes for efficient and scalable approximate similarity search , 2012, Inf. Process. Manag..

[63]  Masajiro Iwasaki Pruned Bi-directed K-nearest Neighbor Graph for Proximity Search , 2016, SISAP.