PANDA: toward partial topology-based search on large networks in a single machine

A large body of research has focused on efficient and scalable processing of subgraph search queries on large networks. In these efforts, a query is posed in the form of a connected query graph. Unfortunately, in practice end users may not always have precise knowledge about the topological relationships between nodes in a query graph to formulate a connected query. In this paper, we present a novel graph querying paradigm called partial topology-based network search and propose a query processing framework called panda to efficiently process partial topology query (ptq) in a single machine. A ptq is a disconnected query graph containing multiple connected query components. ptqs allow an end user to formulate queries without demanding precise information about the complete topology of a query graph. To this end, we propose an exact and an approximate algorithm called sen-panda and po-panda, respectively, to generate top-kmatches of a ptq. We also present a subgraph simulation-based optimization technique to further speedup the processing of ptqs. Using real-life networks with millions of nodes, we experimentally verify that our proposed algorithms are superior to several baseline techniques.

[1]  Anthony K. H. Tung,et al.  Comparing Stars: On Approximating Graph Edit Distance , 2009, Proc. VLDB Endow..

[2]  Roded Sharan,et al.  Torque: topology-free querying of protein interaction networks , 2009, Nucleic Acids Res..

[3]  Jeffrey Xu Yu,et al.  Connected substructure similarity search , 2010, SIGMOD Conference.

[4]  Stefan Voß,et al.  Solving group Steiner problems as Steiner problems , 2004, Eur. J. Oper. Res..

[5]  Rachel Croson,et al.  The boundaries of trust: own and others' actions in the US and China , 2004 .

[6]  Jeffrey Xu Yu,et al.  TreeSpan: efficiently computing similarity all-matching , 2012, SIGMOD Conference.

[7]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[9]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Wei Jin,et al.  SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs , 2010, Proc. VLDB Endow..

[11]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[12]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[13]  Philip S. Yu,et al.  CP-index: on the efficient indexing of large graphs , 2011, CIKM '11.

[14]  Alex Zelikovsky,et al.  An improved approximation scheme for the Group Steiner Problem , 2001, Networks.

[15]  Jianzhong Li,et al.  Graph pattern matching , 2010, Proc. VLDB Endow..

[16]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[17]  Tianyu Wo,et al.  Strong simulation , 2014, ACM Trans. Database Syst..

[18]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, ICDE 2011.

[19]  Jianzhong Li,et al.  Graph homomorphism revisited for graph matching , 2010, Proc. VLDB Endow..

[20]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[21]  Charu C. Aggarwal,et al.  NeMa: Fast Graph Search with Label Similarity , 2013, Proc. VLDB Endow..

[22]  Edmund Ihler,et al.  Bounds on the quality of approximate solutions to the Group Steiner Problem , 1990, WG.

[23]  Lei Chen,et al.  Efficient distributed subgraph similarity matching , 2015, The VLDB Journal.

[24]  Jeong-Hoon Lee,et al.  Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases , 2013, SIGMOD '13.

[25]  Ron Y. Pinter,et al.  Improved Parameterized Algorithms for Network Query Problems , 2018, Algorithmica.

[26]  Shijie Zhang,et al.  GADDI: distance index based subgraph matching in biological networks , 2009, EDBT '09.

[27]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[28]  Roded Sharan,et al.  Topology-Free Querying of Protein Interaction Networks , 2009, RECOMB.

[29]  Yinghui Wu,et al.  Schemaless and Structureless Graph Querying , 2014, Proc. VLDB Endow..

[30]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, Frontiers of Computer Science.

[31]  Ron Y. Pinter,et al.  Partial Information Network Queries , 2015, J. Discrete Algorithms.

[32]  Gabriel Valiente,et al.  A graph distance metric combining maximum common subgraph and minimum common supergraph , 2001, Pattern Recognit. Lett..

[33]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.