Top-K Graph Pattern Matching: A Twig Query Approach

There exist many graph-based applications including bioinformatics, social science, link analysis, citation analysis, and collaborative work. All need to deal with a large data graph. Given a large data graph, in this paper, we study finding top-k answers for a graph query, and in particular, we focus on top-k cyclic graph queries where a graph query is cyclic and can be complex. The capability of supporting top-k cyclic graph queries over a data graph provides much more flexibility for a user to search graphs. And the problem itself is challenging. After investigating a direct yet infeasible solution, we propose a new twig query approach. In our approach, we first identify a spanning tree of the cyclic graph query, which is used to generate a list of ranked twig answers on-demand. Then we identify the top-k answers for the graph query based on the twig answer list. In order to find the best twig query in solving a given cyclic graph query, cost-based optimization for twig query selection is studied. We conducted extensive performance studies using a real dataset, and we report our findings in this paper.

[1]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[2]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[3]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[4]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[5]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[7]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[8]  Faraz Zaidi,et al.  Interactive searching and visualization of patterns in attributed graphs , 2010, Graphics Interface.

[9]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[10]  Haixun Wang,et al.  Efficient subgraph search over large uncertain graphs , 2011, Proc. VLDB Endow..

[11]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[12]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[13]  Wei Wang,et al.  Graph Database Indexing Using Structured Graph Decomposition , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[14]  Walid G. Aref,et al.  Supporting top-kjoin queries in relational databases , 2004, The VLDB Journal.

[15]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[16]  Zahir Tari,et al.  On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE, OTM Confederated International Conferences, CoopIS, DOA, GADA, and ODBASE 2006, Montpellier, France, October 29 - November 3, 2006. Proceedings, Part I , 2006, OTM Conferences.

[17]  Jeffrey Xu Yu,et al.  On-line exact shortest distance query processing , 2009, EDBT '09.

[18]  M. Fatih Demirci,et al.  Graph-based shape indexing , 2010, Machine Vision and Applications.

[19]  Vagelis Hristidis,et al.  ObjectRank: a system for authority-based search on databases , 2006, SIGMOD Conference.

[20]  Daniela Grigori,et al.  BPEL Processes Matchmaking for Service Discovery , 2006, OTM Conferences.

[21]  Rada Chirkova,et al.  Efficient algorithms for exact ranked twig-pattern matching over graphs , 2008, SIGMOD Conference.

[22]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Philip S. Yu,et al.  Mining top-K large structural patterns in a massive network , 2011, Proc. VLDB Endow..

[24]  Jianzhong Li,et al.  Graph pattern matching , 2010, Proc. VLDB Endow..

[25]  Xin Wang,et al.  Incremental graph pattern matching , 2013, TODS.

[26]  Parag Agrawal,et al.  Confidence-Aware Join Algorithms , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[27]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[28]  Jianzhong Li,et al.  Hash-base subgraph query processing method for graph-structured XML documents , 2008, Proc. VLDB Endow..

[29]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[30]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[31]  Jeffrey Xu Yu,et al.  Matching dependence-related queries in the system dependence graph , 2010, ASE.

[32]  Lei Zou,et al.  DistanceJoin: Pattern Match Query In a Large Graph Database , 2009, Proc. VLDB Endow..

[33]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[34]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[35]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[36]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.