Characterising Web Site Link Structure

The topological structures of the Internet and the Web have received considerable attention. However, there has been little research on the topological properties of individual web sites. In this paper, we consider whether web sites (as opposed to the entire Web) exhibit structural similarities. To do so, we exhaustively crawled 18 web sites as diverse as governmental departments, commercial companies and university departments in different countries. These web sites consisted of as little as a few thousand pages to millions of pages. Statistical analysis of these 18 sites revealed that the internal link structure of the web sites are significantly different when measured with first and second- order topological properties, i.e. properties based on the connectivity of an individual or a pairs of nodes. However, examination of a third-order topological property that consider the connectivity between three nodes that form a triangle, revealed a strong correspondence across web sites, suggestive of an invariant. Comparison with the Web, the AS Internet, and a citation network, showed that this third- order property is not shared across other types of networks. Nor is the property exhibited in generative network models such as that of Barabdsi and Albert.

[1]  Priya Mahadevan,et al.  Systematic topology analysis and generation using degree correlations , 2006, SIGCOMM.

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  S. N. Dorogovtsev,et al.  Evolution of networks , 2001, cond-mat/0106144.

[4]  Alessandro Vespignani,et al.  Evolution and Structure of the Internet: A Statistical Physics Approach , 2004 .

[5]  Wilfred Ng An introduction to Search Engines and Web Navigation [Book review] , 2007 .

[6]  M. Faloutsos The internet AS-level topology: three data sources and one definitive metric , 2006, CCRV.

[7]  Michalis Faloutsos,et al.  Internet Topology , 2009, Encyclopedia of Complexity and Systems Science.

[8]  Ingemar J. Cox,et al.  A Comparison of On-Line Computer Science Citation Databases , 2005, ECDL.

[9]  Ingemar J. Cox,et al.  The web structure of e-government - developing a methodology for quantitative evaluation , 2006, WWW '06.

[10]  B. Bollobás The evolution of random graphs , 1984 .

[11]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  A. Rbnyi ON THE EVOLUTION OF RANDOM GRAPHS , 2001 .

[13]  Shi Zhou,et al.  The rich-club phenomenon in the Internet topology , 2003, IEEE Communications Letters.

[14]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[15]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[16]  Priya Mahadevan,et al.  The internet AS-level topology: three data sources and one definitive metric , 2005, Comput. Commun. Rev..

[17]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[18]  R Pastor-Satorras,et al.  Dynamical and correlation properties of the internet. , 2001, Physical review letters.

[19]  Shi Zhou,et al.  Structural constraints in complex networks , 2007, physics/0702096.

[20]  Mark Levene,et al.  An Introduction to Search Engines and Web Navigation (2. ed.) , 2005 .

[21]  Alessandro Vespignani,et al.  Topology and correlations in structured scale-free networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Alessandro Vespignani,et al.  Detecting rich-club ordering in complex networks , 2006, physics/0602134.

[23]  Krishna Bharat,et al.  Who links to whom: mining linkage between Web sites , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[24]  Chris H. Q. Ding,et al.  Web document clustering using hyperlink structures , 2001, Comput. Stat. Data Anal..

[25]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[26]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[27]  Priya Mahadevan,et al.  Systematic topology analysis and generation using degree correlations , 2006, SIGCOMM 2006.

[28]  Ian Soboroff,et al.  Does WT10g look like the web? , 2002, SIGIR '02.

[29]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[30]  Shi Zhou,et al.  Accurately modeling the Internet topology , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.