Approximate Subgraph Matching Query over Large Graph

Approximate subgraph matching query is increasingly adopted to retrieve labeled, heterogeneous networks with millions of vertices and edges. Those networks are usually noisy and lack of fixed schema. Previous exact subgraph matching query (such as subgraph isomorphism) and approximate matching aimed at small proprietary network are not practicable. Recently approximate subgraph matching over large graph usually reduces match accuracy to ensure query efficiency. In this paper, We present a novel approximate subgraph matching query method. We propose a similarity score function to measure the subgraph match quality. Based on it, we adopt a two-step-strategy in subgraph matching query processing: candidate selection and query processing. And we employ an indexing technique to improve query efficiency. We experimentally evaluate our method on query efficiency and effectiveness. The results demonstrate that our method outperforms state-of-the-art method NeMa especially on efficiency.

[1]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Pablo Barceló,et al.  Querying graph patterns , 2011, PODS.

[3]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[4]  Yinghui Wu,et al.  Schemaless and Structureless Graph Querying , 2014, Proc. VLDB Endow..

[5]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[6]  Meng Xu,et al.  NetAlign: a web-based tool for comparison of protein interaction networks , 2006, Bioinform..

[7]  Bonnie Berger,et al.  Global alignment of multiple protein interaction networks with application to functional orthology detection , 2008, Proceedings of the National Academy of Sciences.

[8]  Roded Sharan,et al.  Sigma: a Set-Cover-Based Inexact Graph Matching Algorithm , 2010, J. Bioinform. Comput. Biol..

[9]  Nan Li,et al.  Neighborhood based fast graph search in large networks , 2011, SIGMOD '11.

[10]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[11]  Lei Zou,et al.  A novel spectral coding in a large graph database , 2008, EDBT '08.

[12]  Roded Sharan,et al.  PathBLAST: a tool for alignment of protein interaction networks , 2004, Nucleic Acids Res..

[13]  Wei Jin,et al.  SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs , 2010, Proc. VLDB Endow..

[14]  Ying Wang,et al.  Algorithms for Large, Sparse Network Alignment Problems , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[15]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[17]  Juan-Zi Li,et al.  A gauss function based approach for unbalanced ontology matching , 2009, SIGMOD Conference.

[18]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[19]  Charu C. Aggarwal,et al.  NeMa: Fast Graph Search with Label Similarity , 2013, Proc. VLDB Endow..