A MEASURE OF SIMILARITY BETWEEN GRAPH VERTICES . WITH APPLICATIONS TO SYNONYM EXTRACTION AND WEB SEARCHING

We introduce a concept of {similarity} between vertices of directed graphs. Let GA and GB be two directed graphs with, respectively, nA and nB vertices. We define an nB \times nA similarity matrix S whose real entry sij expresses how similar vertex j (in GA) is to vertex i (in GB): we say that sij is their similarity score. The similarity matrix can be obtained as the limit of the normalized even iterates of Sk+1 = BSkAT + BTSkA, where A and B are adjacency matrices of the graphs and S0 is a matrix whose entries are all equal to 1. In the special case where GA = GB = G, the matrix S is square and the score sij is the similarity score between the vertices i and j of G. We point out that Kleinberg's "hub and authority" method to identify web-pages relevant to a given query can be viewed as a special case of our definition in the case where one of the graphs has two vertices and a unique directed edge between them. In analogy to Kleinberg, we show that our similarity scores are given by the components of a dominant eigenvector of a nonnegative matrix. Potential applications of our similarity concept are numerous. We illustrate an application for the automatic extraction of synonyms in a monolingual dictionary.

[1]  C. A. Trauth,et al.  Connectedness of Products of Two Directed Graphs , 1966 .

[2]  Carlton J. Maxson,et al.  The chainable matrix, a special combinatorial matrix , 1975, Discret. Math..

[3]  Michael Doob,et al.  Spectra of graphs , 1980 .

[4]  G. Rota Non-negative matrices in the mathematical sciences: A. Berman and R. J. Plemmons, Academic Press, 1979, 316 pp. , 1983 .

[5]  Béla Bollobás,et al.  Random Graphs , 1985 .

[6]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[7]  M. Marcus,et al.  A Survey of Matrix Theory and Matrix Inequalities , 1965 .

[8]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[9]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[10]  D. Cvetkovic,et al.  Spectra of Graphs: Theory and Applications , 1997 .

[11]  D. Cvetkovic,et al.  Eigenspaces of graphs: Bibliography , 1997 .

[12]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[13]  Gio Wiederhold,et al.  Thesaurus entry extraction from an on-line dictionary , 1999 .

[14]  Pierre Senellart Masters Internship Report Extraction of information in large graphs; Automatic search for synonyms , 2001 .

[15]  Gordon F. Royle,et al.  Algebraic Graph Theory , 2001, Graduate texts in mathematics.

[16]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[17]  Vincent D. Blondel,et al.  Automatic extraction of synonyms in a dictionary , 2002 .

[18]  Edwin R. Hancock,et al.  Eigenspaces for Graphs , 2002, Int. J. Image Graph..

[19]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[20]  Paul Van Dooren,et al.  Similarity Matrices for Pairs of Graphs , 2003, ICALP.

[21]  Vincent D. Blondel,et al.  Automatic discovery of similar words , 2004 .