A set-cover-based approach for inexact graph matching

Network querying is a growing domain with vast applications ranging from screening compounds against a database of known molecules to matching subnetworks across species. Graph indexing is a powerful method for searching for queries in a large database of graphs. Most graph indexing methods to date tackle the exact matching (isomorphism) problem, limiting their applicability to specific instances in which such matches exist. Here we provide a novel graph indexing method to cope with the more general, inexact matching problem. Our method, SIGMA, builds on approximating a new variant of the set-cover problem that concerns overlapping multi-sets. We extensively test our method and compare it to a layman approach and to the state-of-the-art Grafil. We show that SIGMA outperforms both, providing higher pruning power in all the tested scenarios.

[1]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[2]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[3]  D Bijl The serotonin syndrome. , 2004, The Netherlands journal of medicine.

[4]  Mario Vento,et al.  An Improved Algorithm for Matching Large Graphs , 2001 .

[5]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[6]  Dennis Shasha,et al.  GraphFind: enhancing graph searching by low support data mining techniques , 2008, BMC Bioinformatics.

[7]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[9]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Philip S. Yu,et al.  Graph indexing based on discriminative frequent structure analysis , 2005, TODS.

[11]  Vijay V. Vazirani,et al.  Primal-dual RNC approximation algorithms for (multi)-set (multi)-cover and covering integer programs , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[12]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[13]  Dennis Shasha,et al.  GraphGrep: A fast and universal method for querying graphs , 2002, Object recognition supported by user interaction for service robots.

[14]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).