Linkage Analysis for the World Wide Web and Its Application: A Survey

Up to now, the World Wide Web (WWW) grows into a large hyperlinked corpus with more than 800 million pages and 5 600 million hyperlinks. Moreover, it is obviously impossible that any global ‘planning’ can be imposed on the creation of such a corpus. This brings some challenges to many research fields on the World Wide Web. On the other hand, the hyperlinked Web pages in the networking environment can be a very rich information source for daily or business use, provided people have effective means for understanding the Web. Linkage analysis is playing more and more significant role in many fields on the World Wide Web. Recent advances about the relevant research and application of linkage analysis of World Wide Web are presented in this paper. In particular, some results and achievements about linkage analysis and its applications on Web searching, Web community discovery and the Web modeling are surveyed here.

[1]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[2]  Katherine W. McCain,et al.  Visualizing a Discipline: An Author Co-Citation Analysis of Information Science, 1972-1995 , 1998, J. Am. Soc. Inf. Sci..

[3]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD '00.

[4]  Rick Kazman,et al.  WebQuery: Searching and Visualizing the Web Through Connectivity , 1997, Comput. Networks.

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Steve Chien,et al.  Approximating Aggregate Queries about Web Pages via Random Walks , 2000, VLDB.

[7]  Craig E. Wills,et al.  Towards a Better Understanding of Web Resources and Server Responses for Improved Caching , 1999, Comput. Networks.

[8]  Rick Kazman,et al.  Searching and visualizing the web through connectivity , 1997, The Web Conference.

[9]  Soumen Chakrabarti,et al.  Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction , 2001, WWW '01.

[10]  Mary Czerwinski,et al.  From latent semantics to spatial hypertext—an integrated approach , 1998, HYPERTEXT '98.

[11]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[12]  Eli Upfal,et al.  The Web as a graph , 2000, PODS.

[13]  Katherine W. McCain,et al.  Visualizing a discipline: an author co-citation analysis of information science, 1972–1995 , 1998 .

[14]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[15]  Hector Garcia-Molina,et al.  The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.

[16]  Chanathip Namprempre,et al.  HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering , 1996, HYPERTEXT '96.

[17]  Sougata Mukherjea,et al.  WTMS: a system for collecting and analyzing topic-specific Web information , 2000, Comput. Networks.

[18]  Anja Feldmann,et al.  Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.

[19]  Ellen Spertus,et al.  ParaSite: Mining Structural Information on the Web , 1997, Comput. Networks.

[20]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[21]  Peter Pirolli,et al.  Life, death, and lawfulness on the electronic frontier , 1997, CHI.

[22]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[23]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[24]  Philip S. Yu,et al.  Intelligent crawling on the World Wide Web with arbitrary predicates , 2001, WWW '01.

[25]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[26]  Rodrigo A. Botafogo Cluster analysis for hypertext systems , 1993, SIGIR.

[27]  Gabriel Pinski,et al.  Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics , 1976, Inf. Process. Manag..

[28]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[29]  Ray R. Larson,et al.  Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace , 1996 .

[30]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[31]  Luis Gravano,et al.  Computing Geographical Scopes of Web Resources , 2000, VLDB.

[32]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[33]  Andrei Z. Broder,et al.  The Connectivity Server: Fast Access to Linkage Information on the Web , 1998, Comput. Networks.

[34]  W. Bruce Croft,et al.  A retrieval model incorporating hypertext links , 1989, Hypertext.

[35]  W. Scott Spangler,et al.  Clustering hypertext with applications to web searching , 2000, HYPERTEXT '00.

[36]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[37]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[38]  Marco Gori,et al.  Focused Crawling Using Context Graphs , 2000, VLDB.

[39]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[40]  Soumen Chakrabarti,et al.  Recent results in automatic Web resource discovery , 1999, CSUR.

[41]  Marc Najork,et al.  Breadth-First Search Crawling Yields High-Quality Pages , 2001 .

[42]  Brian D. Davison,et al.  DiscoWeb: Applying Link Analysis to Web Search , 2001 .