Web Communities: Models and Algorithms

In the last few years, a lot of research has been devoted to developing new techniques for improving the recall and the precision of current web search engines. Few works deal with the interesting problem of identifying the communities to which pages belong. Most of the previous approaches try to cluster data by means of spectral techniques or by means of traditional hierarchical algorithms. The main problem with these techniques is that they ignore the relevant fact that web communities are social networks with distinctive statistical properties.In this paper we analyze web communities on the basis of the evolution of an initial set of hubs and authoritative pages. The evolution law captures the behaviour of page authors with respect to the popularity of existing pages for the topics of interest. Assuming such a model, we have found interesting properties of web communities. On the basis of these properties we have proposed a technique for computing relevant properties for specific topics. Several experiments confirmed the validity of both the model and identification method.

[1]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[2]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[3]  L. Ikpaahindi An overview of bibliometrics: its measurements, laws and their applications , 1985 .

[4]  Jon M. Kleinberg,et al.  Spatial gossip and resource location protocols , 2001, JACM.

[5]  Toru Ishida,et al.  Analysis and improvement of HITS algorithm for detecting Web communities , 2004, Systems and Computers in Japan.

[6]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[7]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[8]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[9]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[10]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[11]  Sergio Greco,et al.  A Probabilistic Approach for Distillation and Ranking of Web Pages , 2004, World Wide Web.

[12]  Tsuyoshi Murata,et al.  Discovery of Web Communities Based on the Co-occurrence of References , 2001 .

[13]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[14]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[15]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[16]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[18]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[20]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[21]  Lada A. Adamic The Small World Web , 1999, ECDL.

[22]  Toby Walsh,et al.  Search in a Small World , 1999, IJCAI.

[23]  Ravi Kumar,et al.  Extracting Large-Scale Knowledge Bases from the Web , 1999, VLDB.

[24]  Linyuan Lu,et al.  Random evolution in massive graphs , 2001 .

[25]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[26]  Tsuyoshi Murata,et al.  Discovery of Web Communities Based on the Co-Occurence of References , 2000, Discovery Science.

[27]  Martin Suter,et al.  Small World , 2002 .

[28]  Sergio Greco,et al.  A probabilistic approach for discovering authoritative Web pages , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.