Continuous Subgraph Pattern Search over Graph Streams

Search over graph databases has attracted much attention recently due to its usefulness in many fields, such as the analysis of chemical compounds, intrusion detection in network traffic data, and pattern matching over users' visiting logs. However, most of the existing work focuses on search over static graph databases while in many real applications graphs are changing over time. In this paper we investigate a new problem on continuous subgraph pattern search under the situation where multiple target graphs are constantly changing in a stream style, namely the subgraph pattern search over graph streams. Obviously the proposed problem is a continuous join between query patterns and graph streams where the join predicate is the existence of subgraph isomorphism. Due to the NP-completeness of subgraph isomorphism checking, to achieve the real time monitoring of the existence of certain subgraph patterns, we would like to avoid using subgraph isomorphism verification to find the exact query-stream subgraph isomorphic pairs but to offer an approximate answer that could report all probable pairs without missing any of the actual answer pairs. In this paper we propose a light-weight yet effective feature structure called Node-Neighbor Tree to filter false candidate query-stream pairs. To reduce the computational cost, we further project the feature structures into a numerical vector space and conduct dominant relationship checking in the projected space. We propose two methods to efficiently check dominant relationships and substantiate our methods with extensive experiments.

[1]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[2]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[4]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[5]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[6]  Like Gao,et al.  Continually evaluating similarity-based pattern queries on a streaming time series , 2002, SIGMOD '02.

[7]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[8]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[9]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[10]  Ron Shamir,et al.  Faster subtree isomorphism , 1997, Proceedings of the Fifth Israeli Symposium on Theory of Computing and Systems.

[11]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[12]  Wei Wang,et al.  Graph Database Indexing Using Structured Graph Decomposition , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[14]  Shmuel Friedland,et al.  On the graph isomorphism problem , 2008, ArXiv.

[15]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[18]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[19]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[20]  Ge Yu,et al.  Similarity Match Over High Speed Time-Series Streams , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Philip S. Yu,et al.  Searching Substructures with Superimposed Distance , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  Philip S. Yu,et al.  GString: A Novel Approach for Efficient Search in Graph Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[23]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[24]  Lei Zou,et al.  A novel spectral coding in a large graph database , 2008, EDBT '08.

[25]  Philip S. Yu,et al.  Towards Graph Containment Search and Indexing , 2007, VLDB.

[26]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[27]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.