Iterative Graph Feature Mining for Graph Indexing

Sub graph search is a popular query scenario on graph databases. Given a query graph q, the sub graph search algorithm returns all database graphs having q as a sub graph. To efficiently implement a subgraph search, subgraph features are mined in order to index the graph database. Many subgraph feature mining approaches have been proposed. They are all "mine-at-once" algorithms in which the whole feature set is mined in one run before building a stable graph index. However, due to the change of environments (such as an update of the graph database and the increase of available memory), the index needs to be updated to accommodate such changes. Most of the "mine-at-once" algorithms involve frequent subgraph or subtree mining over the whole graph database. Also, constructing and deploying a new index involves an expensive disk operation such that it is inefficient to re-mine the features and rebuild the index from scratch. We observe that, under most cases, it is sufficient to update a small part of the graph index. Here we propose an "iterative subgraph mining" algorithm which iteratively finds one feature to insert into (or remove from) the index. Since the majority of indexing features and the index structure are not changed, the algorithm can be frequently invoked. We define an objective function that guides the feature mining. Next, we propose a basic branch and bound algorithm to mine the features. Finally, we design an advanced search algorithm, which quickly finds a near-optimum subgraph feature and reduces the search space. Experiments show that our feature mining algorithm is 5 times faster than the popular graph indexing algorithm gIndex, and that features mined by our iterative algorithm have a better filtering rate for the subgraph search problem.

[1]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Ambuj K. Singh,et al.  GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[3]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[4]  Nicole Krämer,et al.  Partial least squares regression for graph mining , 2008, KDD.

[5]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[6]  Ernesto Estrada,et al.  Chemical Graph Theory , 2013 .

[7]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[8]  R. Allen Miller,et al.  A database system of mechanical components based on geometric and topological similarity. Part I: representation , 2003, Comput. Aided Des..

[9]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[10]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[11]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[12]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[13]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[14]  Wei Wang,et al.  Graph Database Indexing Using Structured Graph Decomposition , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  R. Allen Miller,et al.  A database system of mechanical components based on geometric and topological similarity. Part II: indexing, retrieval, matching, and similarity assessment , 2003, Comput. Aided Des..

[16]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[17]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[18]  Yuji Matsumoto,et al.  An Application of Boosting to Graph Classification , 2004, NIPS.

[19]  Joost N. Kok,et al.  The Gaston Tool for Frequent Subgraph Mining , 2005, GraBaTs.

[20]  Mario Vento,et al.  Graph matching applications in pattern recognition and image processing , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[21]  Philip S. Yu,et al.  GString: A Novel Approach for Efficient Search in Graph Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[23]  Jeffrey Xu Yu,et al.  iGraph: A Framework for Comparisons of Disk-Based Graph Indexing Techniques , 2010, Proc. VLDB Endow..

[24]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Lei Zou,et al.  A novel spectral coding in a large graph database , 2008, EDBT '08.

[26]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[28]  Philip S. Yu,et al.  Direct mining of discriminative and essential frequent patterns via model-based search tree , 2008, KDD.

[29]  El-MehalawiMohamed,et al.  A database system of mechanical components based on geometric and topological similarity. Part II , 2003 .

[30]  C. Lee Giles,et al.  Independent informative subgraph mining for graph information retrieval , 2009, CIKM.

[31]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.