A measure of similarity between graph vertices

We introduce a concept of similarity between vertices of directed graphs. Let G_A and G_B be two directed graphs. We define a similarity matrix whose (i, j)-th real entry expresses how similar vertex j (in G_A) is to vertex i (in G_B. The similarity matrix can be obtained as the limit of the normalized even iterates of a linear transformation. In the special case where G_A=G_B=G, the matrix is square and the (i, j)-th entry is the similarity score between the vertices i and j of G. We point out that Kleinberg's "hub and authority" method to identify web-pages relevant to a given query can be viewed as a special case of our definition in the case where one of the graphs has two vertices and a unique directed edge between them. In analogy to Kleinberg, we show that our similarity scores are given by the components of a dominant eigenvector of a non-negative matrix. Potential applications of our similarity concept are numerous. We illustrate an application for the automatic extraction of synonyms in a monolingual dictionary.

[1]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[2]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[3]  Béla Bollobás,et al.  Random Graphs , 1985 .

[4]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[5]  C. A. Trauth,et al.  Connectedness of Products of Two Directed Graphs , 1966 .

[6]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[7]  Vincent D. Blondel,et al.  Automatic extraction of synonyms in a dictionary , 2002 .

[8]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[9]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.