Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together

Subgraph matching (or subgraph isomorphism) is one of the fundamental problems in graph analysis. Extensive research has been done to develop practical solutions for subgraph matching. The state-of-the-art algorithms such as \textsfCFL-Match and \textsfTurbo\textsubscriptiso convert a query graph into a spanning tree for obtaining candidates for each query vertex and obtaining a good matching order with the spanning tree. However, by using the spanning tree instead of the original query graph, it could lead to lower pruning power and a sub-optimal matching order. Another limitation is that they perform redundant computation in search without utilizing the knowledge learned from past computation. In this paper, we introduce three novel concepts to address these inherent limitations: 1) dynamic programming between a directed acyclic graph (DAG) and a graph, 2) adaptive matching order with DAG ordering, and 3) pruning by failing sets, which together lead to a much faster algorithm \textsfDAF for subgraph matching. Extensive experiments with real datasets show that \textsfDAF outperforms the fastest existing solution by up to orders of magnitude in terms of recursive calls as well as in terms of the elapsed time.

[1]  Alexandru Iosup,et al.  LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms , 2016, Proc. VLDB Endow..

[2]  Lin Ma,et al.  Parallel subgraph listing in a large-scale graph , 2014, SIGMOD Conference.

[3]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[4]  Bingsheng He,et al.  Fast Subgraph Matching on Large Graphs using Graphics Processors , 2015, DASFAA.

[5]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[6]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[7]  Shirish Tatikonda,et al.  LCS-TRIM: Dynamic Programming Meets XML Indexing and Querying , 2007, VLDB.

[8]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[9]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[10]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[11]  Andy Schürr,et al.  Incremental Graph Pattern Matching , 2006 .

[12]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[13]  Xuemin Lin,et al.  Efficient processing of graph similarity queries with edit distance constraints , 2013, The VLDB Journal.

[14]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[15]  Tianyu Wo,et al.  Strong simulation , 2014, ACM Trans. Database Syst..

[16]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[17]  Sungpack Hong,et al.  Taming Subgraph Isomorphism for RDF Query Processing , 2015, Proc. VLDB Endow..

[18]  Peter Triantafillou,et al.  Hybrid algorithms for subgraph pattern queries in graph databases , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[19]  Lijun Chang,et al.  Efficient Subgraph Matching by Postponing Cartesian Products , 2016, SIGMOD Conference.

[20]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[21]  Mam Riess Jones Color Coding , 1962, Human factors.

[22]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[23]  Shijie Zhang,et al.  GADDI: distance index based subgraph matching in biological networks , 2009, EDBT '09.

[24]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[25]  Bongki Moon,et al.  Sequencing XML data and query twigs for fast pattern matching , 2006, TODS.

[26]  AmmarKhaled,et al.  Distributed evaluation of subgraph queries using worst-case optimal low-memory dataflows , 2018, VLDB 2018.

[27]  Junhu Wang,et al.  Exploiting Vertex Relationships in Speeding up Subgraph Isomorphism over Large Graphs , 2015, Proc. VLDB Endow..

[28]  Junhu Wang,et al.  Multi-Query Optimization for Subgraph Isomorphism Search , 2016, Proc. VLDB Endow..

[29]  Hans L. Bodlaender,et al.  Dynamic Programming on Graphs with Bounded Treewidth , 1988, ICALP.

[30]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Sungpack Hong,et al.  TurboFlux: A Fast Continuous Subgraph Matching System for Streaming Graph Data , 2018, SIGMOD Conference.

[32]  Lawrence B. Holder,et al.  A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs , 2015, EDBT.

[33]  Min-Soo Kim,et al.  EvoGraph: An Effective and Efficient Graph Upscaling Method for Preserving Graph Properties , 2018, KDD.

[34]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[35]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[36]  Xin Wang,et al.  Diversified Top-k Graph Pattern Matching , 2013, Proc. VLDB Endow..

[37]  Yuan Tian,et al.  A Comparative Study of Subgraph Matching Isomorphic Methods in Social Networks , 2018, IEEE Access.

[38]  Semih Salihoglu,et al.  Distributed Evaluation of Subgraph Queries Using Worstcase Optimal LowMemory Dataflows , 2018, VLDB 2018.

[39]  Nobuji Saito,et al.  Linear-time computability of combinatorial problems on series-parallel graphs , 1982, JACM.

[40]  Eugene L. Lawler,et al.  Linear-Time Computation of Optimal Subgraphs of Decomposable Graphs , 1987, J. Algorithms.

[41]  Serena Villata,et al.  Querying RDF Data Using A Multigraph-based Approach , 2016, EDBT.

[42]  Wenfei Fan,et al.  Making pattern queries bounded in big graphs , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[43]  Jeong-Hoon Lee,et al.  Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases , 2013, SIGMOD '13.

[44]  Lijun Chang,et al.  Scalable Subgraph Enumeration in MapReduce , 2015, Proc. VLDB Endow..

[45]  Wenfei Fan,et al.  Graph pattern matching revised for social network analysis , 2012, ICDT '12.

[46]  Lei Zou,et al.  SQBC: An efficient subgraph matching method over large and dense graphs , 2014, Inf. Sci..

[47]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[48]  Igor Jurisica,et al.  Efficient estimation of graphlet frequency distributions in protein-protein interaction networks , 2006, Bioinform..

[49]  Shirish Tatikonda,et al.  Mining Tree-Structured Data on Multicore Systems , 2009, Proc. VLDB Endow..

[50]  Peng Peng,et al.  Answering subgraph queries over massive disk resident graphs , 2014, World Wide Web.

[51]  David Eppstein,et al.  The Polyhedral Approach to the Maximum Planar Subgraph Problem: New Chances for Related Problems , 1994, GD.