Comparison Issues in Large Graphs: State of the Art and Future Directions

Graph comparison is fundamentally important for many applications such as the analysis of social networks and biological data and has been a significant research area in the pattern recognition and pattern analysis domains. Nowadays, the graphs are large, they may have billions of nodes and edges. Comparison issues in such huge graphs are a challenging research problem. In this paper, we survey the research advances of comparison problems in large graphs. We review graph comparison and pattern matching approaches that focus on large graphs. We categorize the existing approaches into three classes: partition-based approaches, search space based approaches and summary based approaches. All the existing algorithms in these approaches are described in detail and analyzed according to multiple metrics such as time complexity, type of graphs or comparison concept. Finally, we identify directions for future research.

[1]  Mario Vento,et al.  A One Hour Trip in the World of Graphs, Looking at the Papers of the Last Ten Years , 2013, GbRPR.

[2]  Horst Bunke,et al.  A decision tree approach to graph and subgraph isomorphism detection , 1999, Pattern Recognit..

[3]  Jianzhong Li,et al.  Graph pattern matching , 2010, Proc. VLDB Endow..

[4]  Horst Bunke,et al.  Automatic learning of cost functions for graph edit distance , 2007, Inf. Sci..

[5]  Horst Bunke,et al.  A Convolution Edit Kernel for Error-tolerant Graph Matching , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  Brendan D. McKay,et al.  Practical graph isomorphism, II , 2013, J. Symb. Comput..

[7]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[8]  Lakshmish Ramaswamy,et al.  A distributed vertex-centric approach for pattern matching in massive graphs , 2013, 2013 IEEE International Conference on Big Data.

[9]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[10]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[11]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[12]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[13]  Brian Gallagher,et al.  Matching Structure and Semantics: A Survey on Graph-Based Pattern Matching , 2006, AAAI Fall Symposium: Capturing and Using Patterns for Evidence Detection.

[14]  Horst Bunke,et al.  Graph Edit Distance with Node Splitting and Merging, and Its Application to Diatom Idenfication , 2003, GbRPR.

[15]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[16]  Miro Kraetzl,et al.  Graph distances using graph union , 2001, Pattern Recognit. Lett..

[17]  Qing Liu,et al.  A Partition-Based Approach to Structure Similarity Search , 2013, Proc. VLDB Endow..

[18]  Anthony K. H. Tung,et al.  An Efficient Graph Indexing Method , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[19]  Dennis Shasha,et al.  A subgraph isomorphism algorithm and its application to biochemical data , 2013, BMC Bioinformatics.

[20]  Hamamache Kheddouci,et al.  A distance measure for large graphs based on prime graphs , 2014, Pattern Recognit..

[21]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[22]  Horst Bunke,et al.  A Random Walk Kernel Derived from Graph Edit Distance , 2006, SSPR/SPR.

[23]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Lei Zou,et al.  Graph similarity search with edit distance constraint in large graph databases , 2013, CIKM.

[25]  Yanghua Xiao,et al.  Structure-based graph distance measures of high degree of precision , 2008, Pattern Recognit..

[26]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[27]  Jeffrey Xu Yu,et al.  Top-k graph pattern matching over large graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[28]  Xin Wang,et al.  Query preserving graph compression , 2012, SIGMOD Conference.

[29]  Salvatore Tabbone,et al.  Attributed Graph Matching Using Local Descriptions , 2009, ACIVS.

[30]  Shinji Umeyama,et al.  An Eigendecomposition Approach to Weighted Graph Matching Problems , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Tianyu Wo,et al.  Distributed graph pattern matching , 2012, WWW.

[32]  Horst Bunke,et al.  Recent Advances in Graph Matching , 1997, Int. J. Pattern Recognit. Artif. Intell..

[33]  King-Sun Fu,et al.  A graph distance measure for image analysis , 1984, IEEE Transactions on Systems, Man, and Cybernetics.

[34]  Ge Yu,et al.  Efficiently Indexing Large Sparse Graphs for Similarity Search , 2012, IEEE Transactions on Knowledge and Data Engineering.

[35]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[36]  King-Sun Fu,et al.  A distance measure between attributed relational graphs for pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[37]  Robert M. Haralick,et al.  A Metric for Comparing Relational Descriptions , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Tianyu Wo,et al.  Capturing Topology in Graph Pattern Matching , 2011, Proc. VLDB Endow..

[39]  Horst Bunke,et al.  On a relation between graph edit distance and maximum common subgraph , 1997, Pattern Recognit. Lett..

[40]  Jeong-Hoon Lee,et al.  Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases , 2013, SIGMOD '13.

[41]  William J. Christmas,et al.  Structural Matching in Computer Vision Using Probabilistic Relaxation , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Erkki Sutinen,et al.  On Using q-Gram Locations in Approximate String Matching , 1995, ESA.

[43]  Surajit Chaudhuri,et al.  A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[44]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[45]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[46]  Horst Bunke,et al.  Inexact graph matching for structural pattern recognition , 1983, Pattern Recognit. Lett..

[47]  Michel Habib,et al.  A survey of the algorithmic aspects of modular decomposition , 2009, Comput. Sci. Rev..

[48]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[49]  Anthony K. H. Tung,et al.  Comparing Stars: On Approximating Graph Edit Distance , 2009, Proc. VLDB Endow..

[50]  Kaspar Riesen,et al.  Bipartite Graph Matching for Computing the Edit Distance of Graphs , 2007, GbRPR.

[51]  Horst Bunke,et al.  Error Correcting Graph Matching: On the Influence of the Underlying Cost Function , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Nan Li,et al.  Neighborhood based fast graph search in large networks , 2011, SIGMOD '11.

[53]  Xuemin Lin,et al.  Efficient Graph Similarity Joins with Edit Distance Constraints , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[54]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[55]  Alberto Sanfeliu,et al.  Clustering of attributed graphs and unsupervised synthesis of function-described graphs , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[56]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[57]  Peter Willett,et al.  RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs , 2002, Comput. J..

[58]  K. G. Khoo,et al.  Multiple relational graphs mapping using genetic algorithms , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[59]  Lei Zou,et al.  DistanceJoin: Pattern Match Query In a Large Graph Database , 2009, Proc. VLDB Endow..

[60]  Ponnuthurai N. Suganthan,et al.  Structural pattern recognition using genetic algorithms , 2002, Pattern Recognit..

[61]  Horst Bunke,et al.  An Error-Tolerant Approximate Matching Algorithm for Attributed Planar Graphs and Its Application to Fingerprint Classification , 2004, SSPR/SPR.

[62]  Edwin R. Hancock,et al.  A Riemannian approach to graph embedding , 2007, Pattern Recognit..

[63]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[64]  T. Ho,et al.  Data Complexity in Pattern Recognition , 2006 .

[65]  Daniel P. Lopresti,et al.  Comparing Semi-Structured Documents via Graph Probing , 2001, Multimedia Information Systems.

[66]  Robin Milner,et al.  Communication and concurrency , 1989, PHI Series in computer science.

[67]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[68]  Romain Raveaux,et al.  A graph matching method and a graph matching distance based on subgraph assignments , 2010, Pattern Recognit. Lett..

[69]  Edwin R. Hancock,et al.  Weighted Graph-Matching Using Modal Clusters , 2001, CAIP.

[70]  Alessio Micheli,et al.  Neural Network for Graphs: A Contextual Constructive Approach , 2009, IEEE Transactions on Neural Networks.

[71]  Xuelong Li,et al.  A survey of graph edit distance , 2010, Pattern Analysis and Applications.

[72]  Edwin R. Hancock,et al.  Bayesian graph edit distance , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[73]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[74]  Charu C. Aggarwal,et al.  NeMa: Fast Graph Search with Label Similarity , 2013, Proc. VLDB Endow..

[75]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing - "ABSTRACT" , 2009, SPAA '09.

[76]  Kaspar Riesen,et al.  Recent advances in graph-based pattern recognition with applications in document analysis , 2011, Pattern Recognit..

[77]  T. Gallai Transitiv orientierbare Graphen , 1967 .

[78]  Jiawei Han,et al.  Mining Graph Patterns Efficiently via Randomized Summaries , 2009, Proc. VLDB Endow..

[79]  Christos Faloutsos,et al.  Fast best-effort pattern matching in large attributed graphs , 2007, KDD '07.

[80]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[81]  Lakshmish Ramaswamy,et al.  DISTRIBUTED AND SCALABLE GRAPH PATTERN MATCHING: MODELS AND ALGORITHMS , 2014 .

[82]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.