DistanceJoin: Pattern Match Query In a Large Graph Database

The growing popularity of graph databases has generated interesting data management problems, such as subgraph search, shortest-path query, reachability verification, and pattern match. Among these, a pattern match query is more flexible compared to a subgraph search and more informative compared to a shortest-path or reachability query. In this paper, we address pattern match problems over a large data graph G. Specifically, given a pattern graph (i.e., query Q), we want to find all matches (in G) that have the similar connections as those in Q. In order to reduce the search space significantly, we first transform the vertices into points in a vector space via graph embedding techniques, coverting a pattern match query into a distance-based multi-way join problem over the converted vector space. We also propose several pruning strategies and a join order selection method to process join processing efficiently. Extensive experiments on both real and synthetic datasets show that our method outperforms existing ones by orders of magnitude.

[1]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[2]  Nathan Linial,et al.  The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[3]  Anne H. H. Ngu,et al.  Selectivity estimation for joins using systematic sampling , 1997, Database and Expert Systems Applications. 8th International Conference, DEXA '97. Proceedings.

[4]  Elke A. Rundensteiner,et al.  Hierarchical Encoded Path Views for Path Query Processing: An Optimal Model and Its Performance Evaluation , 1998, IEEE Trans. Knowl. Data Eng..

[5]  Christian Böhm,et al.  Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data , 2001, SIGMOD '01.

[6]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[7]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[9]  Cyrus Shahabi,et al.  A Road Network Embedding Technique for K-Nearest Neighbor Search in Moving Object Databases , 2002, GIS '02.

[10]  Jignesh M. Patel,et al.  Structural join order selection for XML query optimization , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[11]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[12]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[13]  Edward P. F. Chan,et al.  Optimization and evaluation of shortest path queries , 2007, The VLDB Journal.

[14]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[15]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  Philip S. Yu,et al.  GString: A Novel Approach for Efficient Search in Graph Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[18]  Christos Faloutsos,et al.  Fast best-effort pattern matching in large attributed graphs , 2007, KDD '07.

[19]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[20]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[22]  Wei Wang,et al.  Graph Database Indexing Using Structured Graph Decomposition , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[23]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[24]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[25]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[26]  Yangjun Chen,et al.  An Efficient Algorithm for Answering Graph Reachability Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[27]  Rada Chirkova,et al.  Efficient algorithms for exact ranked twig-pattern matching over graphs , 2008, SIGMOD Conference.

[28]  Jeffrey Xu Yu,et al.  On-line exact shortest distance query processing , 2009, EDBT '09.

[29]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .