SQBC: An efficient subgraph matching method over large and dense graphs

Recent progress in biology and computer science have generated many complicated networks, most of which can be modeled as large and dense graphs. Developing effective and efficient subgraph match methods over these graphs is urgent, meaningful and necessary. Although some excellent exploratory approaches have been proposed these years, they show poor performances when the graphs are large and dense. This paper presents a novel Subgraph Query technique Based on Clique feature, called SQBC, which integrates the carefully designed clique encoding with the existing vertex encoding [40] as the basic index unit to reduce the search space. Furthermore, SQBC optimizes the subgraph isomorphism test based on clique features. Extensive experiments over biological networks, RDF dataset and synthetic graphs have shown that SQBC outperforms the most popular competitors both in effectiveness and efficiency especially when the data graphs are large and dense.

[1]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[2]  Lei Zou,et al.  Answering Subgraph Queries over Large Graphs , 2011, WAIM.

[3]  Shijie Zhang,et al.  GADDI: distance index based subgraph matching in biological networks , 2009, EDBT '09.

[4]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[5]  Jan Ramon,et al.  Frequent subgraph mining in outerplanar graphs , 2006, KDD.

[6]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[7]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[8]  Lei Zou,et al.  A novel spectral coding in a large graph database , 2008, EDBT '08.

[9]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[10]  Dennis Shasha,et al.  SING: Subgraph search In Non-homogeneous Graphs , 2010, BMC Bioinformatics.

[11]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  M E J Newman,et al.  Identity and Search in Social Networks , 2002, Science.

[13]  Natwar Modani,et al.  Large maximal cliques enumeration in sparse graphs , 2008, CIKM '08.

[14]  Nicos Christofides,et al.  Vehicle routing with a sparse feasibility graph , 1997 .

[15]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[17]  Philip S. Yu,et al.  A Load Shedding Framework and Optimizations for M-way Windowed Stream Joins , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  R Samudrala,et al.  A graph-theoretic algorithm for comparative modeling of protein structure. , 1998, Journal of molecular biology.

[20]  Wei Jin,et al.  SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs , 2010, Proc. VLDB Endow..

[21]  Kengo Katayama,et al.  An effective local search for the maximum clique problem , 2005, Inf. Process. Lett..

[22]  Shinya Takahashi,et al.  A Simple and Faster Branch-and-Bound Algorithm for Finding a Maximum Clique , 2010, WALCOM.

[23]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[24]  Etsuji Tomita,et al.  Clique-based data mining for related genes in a biomedical database , 2009, BMC Bioinformatics.

[25]  Xuemin Lin,et al.  NOVA: A Novel and Efficient Framework for Finding Subgraph Isomorphism Mappings in Large Graphs , 2010, DASFAA.

[26]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[27]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[28]  Etsuji Tomita,et al.  An Efficient Branch-and-bound Algorithm for Finding a Maximum Clique with Computational Experiments , 2001, J. Glob. Optim..

[29]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Patric R. J. Östergård,et al.  A fast algorithm for the maximum clique problem , 2002, Discret. Appl. Math..

[31]  Federica Mandreoli,et al.  Flexible query answering on graph-modeled data , 2009, EDBT '09.

[32]  Samir Khuller,et al.  On Finding Dense Subgraphs , 2009, ICALP.

[33]  Wei Wang,et al.  Graph Database Indexing Using Structured Graph Decomposition , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[34]  Philip S. Yu,et al.  Mining top-K large structural patterns in a massive network , 2011, Proc. VLDB Endow..

[35]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[36]  Roded Sharan,et al.  QNet: A Tool for Querying Protein Interaction Networks , 2007, RECOMB.

[37]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[38]  Dennis Shasha,et al.  GraphGrep: A fast and universal method for querying graphs , 2002, Object recognition supported by user interaction for service robots.

[39]  Xiang Lian,et al.  Efficient query answering in probabilistic RDF graphs , 2011, SIGMOD '11.

[40]  Wayne J. Pullan,et al.  Simple ingredients leading to very efficient heuristics for the maximum clique problem , 2008, J. Heuristics.

[41]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[42]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[43]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.