On the universality of rank distributions of website popularity

We present an extensive analysis of long-term statistics of the queries to websites using logs collected on several web caches in Russian academic networks and on US IR Cache caches. We check the sensitivity of the statistics to several parameters: (1) duration of data collection, (2) geographical location of the cache server collecting data, and (3) the year of data collection. We propose a two-parameter modification of the Zipf law and interpret the parameters. We find that the rank distribution of websites is stable when approximated by the modified Zipf law. We suggest that website popularity may be a universal property of Internet.

[1]  D. Zanette,et al.  ROLE OF INTERMITTENCY IN URBAN DEVELOPMENT : A MODEL OF LARGE-SCALE CITY FORMATION , 1997 .

[2]  Syam Gadde,et al.  The Trickle-Down Effect: Web Caching and Server Request Distribution , 2002, Comput. Commun..

[3]  H E Stanley,et al.  Linguistic features of noncoding DNA sequences. , 1994, Physical review letters.

[4]  Azer Bestavros,et al.  WWW traffic reduction and load balancing through server-based caching , 1997, IEEE Concurrency.

[5]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[6]  M. Crovella,et al.  Estimating the Heavy Tail Index from Scaling Properties , 1999 .

[7]  Venkata N. Padmanabhan,et al.  The content and access dynamics of a busy web site: findings and implicatins , 2000, SIGCOMM.

[8]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[9]  Azer Bestavros,et al.  Sources and characteristics of Web temporal locality , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).

[10]  Liu Jin-gao,et al.  Scale-Free Download Network for Publications , 2004 .

[11]  Alexander L. Efros,et al.  Electronic Properties of Doped Semi-conductors , 1984 .

[12]  Azer Bestavros,et al.  Changes in Web client access patterns: Characteristics and caching implications , 1999, World Wide Web.

[13]  M. Marsili,et al.  Interacting Individuals Leading to Zipf's Law , 1998, cond-mat/9801289.

[14]  D. Turcotte,et al.  Forest fires: An example of self-organized critical behavior , 1998, Science.

[15]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[16]  Terence Kelly,et al.  Aliasing on the world wide web: prevalence and performance implications , 2002, WWW '02.

[17]  Steven Glassman,et al.  A Caching Relay for the World Wide Web , 1994, Comput. Networks ISDN Syst..

[18]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[20]  Virgílio A. F. Almeida,et al.  Characterizing reference locality in the WWW , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[21]  刘锦高,et al.  Scale-Free Download Network for Publications , 2004 .

[22]  Shuichiro Yamamoto,et al.  An "Interest" Index for WWW Servers and CyberRanking , 2000 .

[23]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[24]  Anirban Mahanti,et al.  Traffic analysis of a Web proxy caching hierarchy , 2000 .

[25]  Hiroshi Tsuji,et al.  Memory-Based Architecture for Distributed WWW Caching Proxy , 1998, Comput. Networks.

[26]  Charles Gide,et al.  Cours d'économie politique , 1911 .

[27]  Andrzej K. Konopka,et al.  Oligonucleotide Frequencies in DNA Follow a Yule Distribution , 1996, Comput. Chem..

[28]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[29]  Richard B. Bunt,et al.  Hierarchical Workload Characterization for a Busy Web Server , 2002, Computer Performance Evaluation / TOOLS.

[30]  Lili Qiu,et al.  The content and access dynamics of a busy Web site: findings and implications , 2000 .

[31]  D. Sornette,et al.  Stretched exponential distributions in nature and economy: “fat tails” with characteristic scales , 1998, cond-mat/9801293.

[32]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[33]  Masaki Aida,et al.  A Proposal of Dual Zipfian Model for Describing HTTP Access Trends and Its Application to Address Cache Design , 1998 .

[34]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[35]  Lev N. Shchur,et al.  ACTIVE MEASUREMENTS (EXPERIMENTS) OF THE INTERNET TRAFFIC USING CACHE-MESH , 2001 .

[36]  Lev N. Shchur Incipient Spanning Clusters in Square and Cubic Percolation , 2000 .

[37]  David P. Landau,et al.  Computer Simulation Studies in Condensed-Matter Physics XII , 2000 .

[38]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).