Graph homomorphism revisited for graph matching

In a variety of emerging applications one needs to decide whether a graph G matches another Gp, i.e., whether G has a topological structure similar to that of Gp. The traditional notions of graph homomorphism and isomorphism often fall short of capturing the structural similarity in these applications. This paper studies revisions of these notions, providing a full treatment from complexity to algorithms. (1) We propose p-homomorphism (p-hom) and 1-1 p-hom, which extend graph homomorphism and subgraph isomorphism, respectively, by mapping edges from one graph to paths in another, and by measuring the similarity of nodes. (2) We introduce metrics to measure graph similarity, and several optimization problems for p-hom and 1-1 p-hom. (3) We show that the decision problems for p-hom and 1-1 p-hom are NP-complete even for DAGs, and that the optimization problems are approximation-hard. (4) Nevertheless, we provide approximation algorithms with provable guarantees on match quality. We experimentally verify the effectiveness of the revised notions and the efficiency of our algorithms in Web site matching, using real-life and synthetic data.

[1]  Hector Garcia-Molina,et al.  Finding replicated Web collections , 2000, SIGMOD 2000.

[2]  Hector Garcia-Molina,et al.  Finding replicated Web collections , 2000, SIGMOD '00.

[3]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[4]  Sachindra Joshi,et al.  A bag of paths model for measuring structural similarity in Web documents , 2003, KDD '03.

[5]  H. Bunke Graph Matching : Theoretical Foundations , Algorithms , and Applications , 2022 .

[6]  Sharma Chakravarthy,et al.  eMailSift: eMail classification based on structure and content , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[8]  Jianhua Feng,et al.  Edit Distance Evaluation on Graph Structures , 2008 .

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Ravi B. Boppana,et al.  Approximating maximum independent sets by excluding subgraphs , 1990, BIT.

[12]  Abraham Kandel,et al.  Classification Of Web Documents Using Graph Matching , 2004, Int. J. Pattern Recognit. Artif. Intell..

[13]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[14]  Rafael Berlanga Llavori,et al.  Fragment-based approximate retrieval in highly heterogeneous XML collections , 2008, Data Knowl. Eng..

[15]  Paul Van Dooren,et al.  A MEASURE OF SIMILARITY BETWEEN GRAPH VERTICES . WITH APPLICATIONS TO SYNONYM EXTRACTION AND WEB SEARCHING , 2002 .

[16]  Anthony K. H. Tung,et al.  Comparing Stars: On Approximating Graph Edit Distance , 2009, Proc. VLDB Endow..

[17]  Viggo Kann,et al.  On the Approximability of the Maximum Common Subgraph Problem , 1992, STACS.

[18]  Wenfei Fan,et al.  Information preserving XML schema embedding , 2005, TODS.

[19]  Esko Nuutila An Efficient Transitive Closure Algorithm for Cyclic Digraphs , 1994, Inf. Process. Lett..

[20]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[21]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[22]  Andrei Z. Broder,et al.  Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content , 1999, Comput. Networks.

[23]  Andrei Z. Broder,et al.  A Comparison of Techniques to Find Mirrored Hosts on the WWW , 2000, IEEE Data Eng. Bull..

[24]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[25]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[26]  Hector Garcia-Molina,et al.  Web graph similarity for anomaly detection , 2010, Journal of Internet Services and Applications.

[27]  Lei Zou,et al.  DistanceJoin: Pattern Match Query In a Large Graph Database , 2009, Proc. VLDB Endow..

[28]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[29]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[30]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.