Fast Graph Similarity Search via Locality Sensitive Hashing

Similarity search in graph databases has been widely studied in graph query processing in recent years. With the fast accumulation of graph databases, it is worthwhile to develop a fast algorithm to support similarity search in large-scale graph databases. In this paper, we study k-NN similarity search problem via locality sensitive hashing. We propose a fast graph search algorithm, which first transforms complex graphs into vectorial representations based on the prototypes in the database and then accelerates query efficiency in Euclidean space by employing locality sensitive hashing. Additionally, a general retrieval framework is established in our approach. Experiments on three real datasets show that our work achieves high performance both on the accuracy and the efficiency of the presented algorithm.

[1]  Anthony K. H. Tung,et al.  Comparing Stars: On Approximating Graph Edit Distance , 2009, Proc. VLDB Endow..

[2]  Xianglong Liu,et al.  Reciprocal Hash Tables for Nearest Neighbor Search , 2013, AAAI.

[3]  Anthony K. H. Tung,et al.  An Efficient Graph Indexing Method , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[4]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  Xuelong Li,et al.  A survey of graph edit distance , 2010, Pattern Analysis and Applications.

[6]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[7]  Chiou-Ting Hsu,et al.  Image retrieval with relevance feedback based on graph-theoretic region correspondence estimation , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[9]  Gabriel Valiente,et al.  A graph distance metric combining maximum common subgraph and minimum common supergraph , 2001, Pattern Recognit. Lett..

[10]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[11]  Yasuo Tabei,et al.  Kernel-based Similarity Search in Massive Graph Databases with Wavelet Trees , 2011, SDM.

[12]  Xianglong Liu,et al.  Collaborative Hashing , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ge Yu,et al.  Efficiently Indexing Large Sparse Graphs for Similarity Search , 2012, IEEE Transactions on Knowledge and Data Engineering.

[14]  Kaspar Riesen,et al.  IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning , 2008, SSPR/SPR.

[15]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[16]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[17]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[18]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[19]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.