Identification of clusters in the Web graph based on link topology

The Web graph has recently been used to model the link structure of the Web. The studies of such graphs can yield valuable insights into Web algorithms for crawling, searching and discovery of Web communities. This paper proposes a new approach to clustering the Web graph. The proposed algorithm identifies a small subset of the graph as "core" members of clusters, and then incrementally constructs the clusters by a selection criterion. Two qualitative criteria are proposed to measure the quality of graph clustering. We have implemented our algorithm and tested a set of arbitrary graphs with good results. Applications of our approach include graph drawing and Web visualization.

[1]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[2]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[3]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[4]  David W. Matula,et al.  The cohesive strength of graphs , 1969 .

[5]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[6]  Roded Sharan,et al.  CLICK: A Clustering Algorithm for Gene Expression Analysis , 2000, ISMB 2000.

[7]  Christos H. Papadimitriou,et al.  Algorithms, games, and the internet , 2001, STOC '01.

[8]  Tamara Munzner,et al.  Visualizing the structure of the World Wide Web in 3D hyperbolic space , 1995, VRML '95.

[9]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[10]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[11]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[12]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[13]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[14]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[15]  Ravi Kumar,et al.  Self-similarity in the web , 2001, TOIT.

[16]  Peter Eades,et al.  Multilevel Visualization of Clustered Graphs , 1996, GD.

[17]  D. Matula Graph Theoretic Techniques for Cluster Analysis Algorithms , 1977 .

[18]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[19]  D. Matula k-Components, Clusters and Slicings in Graphs , 1972 .

[20]  Torben Bach Pedersen,et al.  Proceedings of the Eighth International Database Engineering and Applications Symposium , 2004 .

[21]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[22]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[23]  Masaru Kitsuregawa,et al.  An approach to relate the Web communities through bipartite graphs , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.