The structural cause of file size distributions

We propose a user model that explains the shape of the distribution of file sizes in local file systems and in the World Wide Web. We examine evidence from 562 file systems, 38 Web clients and 6 Web servers, and find that this model is an accurate description of these systems. We compare this model to an alternative that has been proposed, the Pareto model. Our results cast doubt on the widespread view that the distribution of file sizes is long-tailed; we discuss the implications of this conclusion for proposed explanations of self-similarity in the Internet.

[1]  William J. Bolosky,et al.  A large-scale study of file-system contents , 1999, SIGMETRICS '99.

[2]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[3]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[4]  Andras Veres,et al.  The chaotic nature of TCP congestion control , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[5]  Anja Feldmann,et al.  The changing nature of network traffic: scaling phenomena , 1998, CCRV.

[6]  Anja Feldmann,et al.  Dynamics of IP traffic: a study of the role of variability and the impact of control , 1999, SIGCOMM '99.

[7]  Jie Yu,et al.  Heavy tails, generalized coding, and optimal Web layout , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[8]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[9]  Kihong Park,et al.  On the relationship between file sizes, transport protocols, and self-similar network traffic , 1996, Proceedings of 1996 International Conference on Network Protocols (ICNP-96).

[10]  Martin F. Arlitt,et al.  Workload characterization of a Web proxy in a cable modem environment , 1999, PERV.

[11]  Martin Arlitt,et al.  Workload Characterization of the 1998 World Cup Web Site , 1999 .

[12]  Walter Willinger,et al.  Proof of a fundamental result in self-similar traffic modeling , 1997, CCRV.

[13]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[14]  Walter Willinger,et al.  Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level , 1997, TNET.

[15]  Armand M. Makowski,et al.  M|G|/spl infin/ input processes: a versatile class of models for network traffic , 1997, Proceedings of INFOCOM '97.

[16]  Wu-chun Feng,et al.  The adverse impact of the TCP congestion-control mechanism in heterogeneous computing systems , 2000, Proceedings 2000 International Conference on Parallel Processing.

[17]  M. Crovella,et al.  Estimating the Heavy Tail Index from Scaling Properties , 1999 .

[18]  M. Crovella,et al.  Heavy-tailed probability distributions in the World Wide Web , 1998 .

[19]  Martin Arlitt,et al.  A workload characterization study of the 1998 World Cup Web site , 2000, IEEE Netw..

[20]  Walter Willinger,et al.  Self‐Similar Network Traffic: An Overview , 2002 .

[21]  Sally Floyd,et al.  Wide area traffic: the failure of Poisson modeling , 1995, TNET.