Inferring Web communities from link topology

The World Wide Web grows through a decentralized, almost anarchic process, and this has resulted in a large hyperlinked corpus without the kind of logical organization that can be built into more tradit,ionally-created hypermedia. To extract, meaningful structure under such circumstances, we develop a notion of hyperlinked communities on the www t,hrough an analysis of the link topology. By invoking a simple, mathematically clean method for defining and exposing the structure of these communities, we are able to derive a number of themes: The communities can be viewed as containing a core of central, “authoritative” pages linked togh and they exhibit a natural type of hierarchical topic generalization that can be inferred directly from the pat,t,ern of linkage. Our investigation shows that although the process by which users of the Web create pages and links is very difficult to understand at a “local” level, it results in a much greater degree of orderly high-level structure than has typically been assumed.

[1]  Gene H. Golub,et al.  Matrix computations , 1983 .

[2]  Béla Bollobás,et al.  Random Graphs , 1985 .

[3]  Mark E. Frisse,et al.  Searching for information in a hypertext medical handbook , 1987, Commun. ACM.

[4]  E. Frisse Mark,et al.  Searching for information in a hypertext medical handbook , 1988 .

[5]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[6]  Ben Shneiderman,et al.  Structural analysis of hypertexts: identifying hierarchies and useful metrics , 1992, TOIS.

[7]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[8]  Chanathip Namprempre,et al.  HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering , 1996, HYPERTEXT '96.

[9]  Ray R. Larson,et al.  Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace , 1996 .

[10]  Terry Winograd,et al.  SenseMaker: an information-exploration interface supporting the contextual evolution of a user's interests , 1997, CHI.

[11]  Frank M. Shipman,et al.  Hypertext paths and the World-Wide Web: experiences with Walden's Paths , 1997, HYPERTEXT '97.

[12]  Sougata Mukherjea,et al.  Focus+context views of World-Wide Web nodes , 1997, HYPERTEXT '97.

[13]  Gene Golovchinsky,et al.  What the query told the link: the integration of hypertext and information retrieval , 1997, HYPERTEXT '97.

[14]  Chaomei Chen Structuring and visualising the WWW by generalised similarity analysis , 1997, HYPERTEXT '97.

[15]  Massimo Marchiori,et al.  The Quest for Correct Information on the Web: Hyper Search Engines , 1997, Comput. Networks.

[16]  Rick Kazman,et al.  WebQuery: Searching and Visualizing the Web Through Connectivity , 1997, Comput. Networks.

[17]  Alberto O. Mendelzon,et al.  Applications of a Web Query Language , 1997, Comput. Networks.

[18]  Rick Kazman,et al.  Searching and visualizing the web through connectivity , 1997, The Web Conference.

[19]  Ellen Spertus,et al.  ParaSite: Mining Structural Information on the Web , 1997, Comput. Networks.

[20]  J. Kleinberg,et al.  Authoritative Soueces in a Hyper-linked Environment , 1998, SODA 1998.

[21]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[22]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[23]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[24]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.