In-Memory Subgraph Matching: An In-depth Study

We study the performance of eight representative in-memory subgraph matching algorithms. Specifically, we put QuickSI, GraphQL, CFL, CECI, DP-iso, RI and VF2++ in a common framework to compare them on the following four aspects: (1) method of filtering candidate vertices in the data graph; (2) method of ordering query vertices; (3) method of enumerating partial results; and (4) other optimization techniques. Then, we compare the overall performance of these algorithms with Glasgow, an algorithm based on the constraint programming. Through experiments, we find that (1) the filtering method of GraphQL is competitive to that of the latest algorithms CFL, CECI and DP-iso in terms of pruning power; (2) the ordering methods in GraphQL and RI are usually the most effective; (3) the set intersection based local candidate computation in CECI and DP-iso performs the best in the enumeration; and (4) the failing sets pruning in DP-iso can significantly improve the performance when queries become large. Our source code is publicly available at https://github.com/RapidsAtHKUST/SubgraphMatching.

[1]  Shijie Zhang,et al.  GADDI: distance index based subgraph matching in biological networks , 2009, EDBT '09.

[2]  Dennis Shasha,et al.  A subgraph isomorphism algorithm and its application to biochemical data , 2013, BMC Bioinformatics.

[3]  Jeong-Hoon Lee,et al.  Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases , 2013, SIGMOD '13.

[4]  Lijun Chang,et al.  Scalable Subgraph Enumeration in MapReduce , 2015, Proc. VLDB Endow..

[5]  Emir Pasalic,et al.  Design and Implementation of the LogicBlox System , 2015, SIGMOD Conference.

[6]  Lei Zou,et al.  Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions , 2018, SIGMOD Conference.

[7]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Darren Strash,et al.  Shared Memory Parallel Subgraph Enumeration , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[10]  Bingsheng He,et al.  Fast Subgraph Matching on Large Graphs using Graphics Processors , 2015, DASFAA.

[11]  한명지 An Efficient Algorithm for Subgraph Isomorphism using Dynamic Programming on Directed Acyclic Graphs , 2018 .

[12]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[13]  Rong Gu,et al.  BENU: Distributed Subgraph Enumeration with Backtracking-Based Framework , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[14]  Zhengping Qian,et al.  Distributed Subgraph Matching on Timely Dataflow , 2019, Proc. VLDB Endow..

[15]  Ciaran McCreesh,et al.  Sequential and Parallel Solution-Biased Search for Subgraph Algorithms , 2019, CPAIOR.

[16]  Georg Gottlob,et al.  Hypertree Decompositions: Structure, Algorithms, and Applications , 2005, WG.

[17]  Wook-Shin Han,et al.  Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together , 2019, SIGMOD Conference.

[18]  Christine Solnon,et al.  When Subgraph Isomorphism is Really Hard, and Why This Matters for Graph Databases , 2018, J. Artif. Intell. Res..

[19]  Carlos R. Rivero,et al.  Efficient and scalable labeled subgraph matching using SGMatch , 2017, Knowledge and Information Systems.

[20]  Sourav S. Bhowmick,et al.  DUALSIM: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine , 2016, SIGMOD Conference.

[21]  Pierluigi Ritrovato,et al.  A Parallel Algorithm for Subgraph Isomorphism , 2019, GbRPR.

[22]  Jeremy Chen,et al.  Graphflow: An Active Graph Database , 2017, SIGMOD Conference.

[23]  Christine Solnon,et al.  Experimental Evaluation of Subgraph Isomorphism Solvers , 2019, GbRPR.

[24]  Dennis Shasha,et al.  GRAPES: A Software for Parallel Searching on Biological Graphs Targeting Multi-Core Architectures , 2013, PloS one.

[25]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[26]  Qiong Luo,et al.  Scaling Up Subgraph Query Processing with Efficient Subgraph Matching , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[27]  H. Howie Huang,et al.  CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching , 2019, SIGMOD Conference.

[28]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[29]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[30]  Shuigeng Zhou,et al.  QUBLE: towards blending interactive visual subgraph search queries on large networks , 2014, The VLDB Journal.

[31]  Alessia Saggese,et al.  Challenging the Time Complexity of Exact Subgraph Isomorphism for Huge and Dense Graphs with VF3 , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Jeffrey D. Ullman,et al.  Enumerating subgraph instances using map-reduce , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[33]  Hong Cheng,et al.  Subgraph Matching: on Compression and Computation , 2017, Proc. VLDB Endow..

[34]  Qiong Luo,et al.  Efficient Parallel Subgraph Enumeration on a Single Machine , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[35]  Karsten Klein,et al.  CT-index: Fingerprint-based graph indexing combining cycles and trees , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[36]  Qiong Luo,et al.  Parallelizing Recursive Backtracking Based Subgraph Matching on a Single Machine , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[37]  Alpár Jüttner,et al.  VF2++ - An improved subgraph isomorphism algorithm , 2018, Discret. Appl. Math..

[38]  Peter Triantafillou,et al.  Performance and Scalability of Indexed Subgraph Query Processing Methods , 2015, Proc. VLDB Endow..

[39]  Shuigeng Zhou,et al.  BOOMER: Blending Visual Formulation and Processing of P -Homomorphic Queries on Large Networks , 2018, SIGMOD Conference.

[40]  Lijun Chang,et al.  Efficient Subgraph Matching by Postponing Cartesian Products , 2016, SIGMOD Conference.

[41]  Yannis Velegrakis,et al.  Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation , 2018, Proc. VLDB Endow..

[42]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[43]  Peter Triantafillou,et al.  Subgraph Querying with Parallel Use of Query Rewritings and Alternative Algorithms , 2017, EDBT.

[44]  Amine Mhedhbi,et al.  The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing , 2017 .

[45]  Mario Vento,et al.  VF2 Plus: An Improved version of VF2 for Biological Graphs , 2015, GbRPR.

[46]  Atri Rudra,et al.  Join Processing for Graph Patterns: An Old Dog with New Tricks , 2015, GRADES@SIGMOD/PODS.

[47]  Hung Q. Ngo,et al.  Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems , 2018, PODS.

[48]  Shuigeng Zhou,et al.  QUBLE: blending visual subgraph query formulation with query processing on large networks , 2013, SIGMOD '13.

[49]  Lijun Chang,et al.  Scalable Distributed Subgraph Enumeration , 2016, Proc. VLDB Endow..

[50]  Kunle Olukotun,et al.  EmptyHeaded: A Relational Engine for Graph Processing , 2015, ACM Trans. Database Syst..

[51]  Semih Salihoglu,et al.  Distributed Evaluation of Subgraph Queries Using Worstcase Optimal LowMemory Dataflows , 2018, VLDB 2018.

[52]  Mario Vento,et al.  Report on the First Contest on Graph Matching Algorithms for Pattern Search in Biological Databases , 2015, GbRPR.

[53]  AmmarKhaled,et al.  Distributed evaluation of subgraph queries using worst-case optimal low-memory dataflows , 2018, VLDB 2018.

[54]  Junhu Wang,et al.  Exploiting Vertex Relationships in Speeding up Subgraph Isomorphism over Large Graphs , 2015, Proc. VLDB Endow..

[55]  Zhe Wu,et al.  PGX.ISO: Parallel and Efficient In-Memory Engine for Subgraph Isomorphism , 2014, GRADES.

[56]  Christine Solnon,et al.  AllDifferent-based filtering for subgraph isomorphism , 2010, Artif. Intell..

[57]  Ciaran McCreesh,et al.  A Parallel, Backjumping Subgraph Isomorphism Algorithm Using Supplemental Graphs , 2015, CP.

[58]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[59]  Amine Mhedhbi,et al.  Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins , 2019, Proc. VLDB Endow..

[60]  Lin Ma,et al.  Parallel subgraph listing in a large-scale graph , 2014, SIGMOD Conference.

[61]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..