On the tails of web file size distributions

Power laws have been observed in various contexts in the Internet. There has been considerable interest in identifying the mechanisms behind these power laws. Most of these have focused on the tail behavior of the distributions. We argue that the the tails and their asymptotic behavior is very hard to substantiate in realistic engineering systems. In this paper we describe some of the proposed mechanisms for producing power law tails. We show that these mechanisms are not particularly robust. Furthermore, we argue that the data ususally available for classifying a distribution is insufficient to classify the tail. Fortunately, the tail has little impact on Internet performance. Thus it is sufficient to focus on mechanisms leading to power law like “waists” of the distributions.

[1]  Ibrahim Matta,et al.  On the origin of power laws in Internet topologies , 2000, CCRV.

[2]  Allen B. Downey,et al.  The structural cause of file size distributions , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[3]  Predrag R. Jelenkovic,et al.  Capacity regions for network multiplexers with heavy-tailed fluid on-off sources , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[4]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[5]  Mor Harchol-Balter,et al.  Analysis of SRPT scheduling: investigating unfairness , 2001, SIGMETRICS '01.

[6]  Armand M. Makowski,et al.  M|G|/spl infin/ input processes: a versatile class of models for network traffic , 1997, Proceedings of INFOCOM '97.

[7]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[8]  William J. Reed,et al.  The Double Pareto-Lognormal Distribution—A New Parametric Model for Size Distributions , 2004, WWW 2001.

[9]  Jie Yu,et al.  Heavy tails, generalized coding, and optimal Web layout , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[10]  David M. Raup,et al.  How Nature Works: The Science of Self-Organized Criticality , 1997 .

[11]  J M Carlson,et al.  Highly optimized tolerance: a mechanism for power laws in designed systems. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[12]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[13]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[14]  X. Gabaix Zipf's Law for Cities: An Explanation , 1999 .

[15]  Michael Mitzenmacher,et al.  Dynamic Models for File Sizes and Double Pareto Distributions , 2004, Internet Math..

[16]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[17]  Perline Zipf's law, the central limit theorem, and the random division of the unit interval. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[18]  E CrovellaMark,et al.  Self-similarity in World Wide Web traffic , 1996 .

[19]  Ilkka Norros,et al.  A storage model with self-similar input , 1994, Queueing Syst. Theory Appl..