The stretched exponential distribution of internet media access patterns

The commonly agreed Zipf-like access pattern of Web workloads is mainly based on Internet measurements when text-based content dominated the Web traffic. However, with dramatic increase of media traffic on the Internet, the inconsistency between the access patterns of media objects and the Zipf model has been observed in a number of studies. An insightful understanding of media access patterns is essential to guide Internet system design and management, including resource provisioning and performance optimizations. In this paper, we have studied a large variety of media workloads collected from both client and server sides in different media systems with different delivery methods. Through extensive analysis and modeling, we find: (1) the object reference ranks of all these workloads follow the stretched exponential (SE) distribution despite their different media systems and delivery methods; (2) one parameter of this distribution well characterizes the media file sizes, the other well characterizes the aging of media accesses; (3) some biased measurements may lead to Zipf-like observations on media access patterns; and (4) the deviation of media access pattern from the Zipf model in these workloads increases along with the workload duration. We have further analyzed the effectiveness of media caching with a mathematical model. Compared with Web caching under the Zipf model, media caching under the SE model is far less effective unless the cache size is enormously large. This indicates that many previous studies based on a Zipf-like assumption have potentially overestimated the media caching benefit, while an effective media caching system must be able to scale its storage size to accommodate the increase of media content over a long time. Our study provides an analytical basis for applying a P2P model rather than a client-server model to build large scale Internet media delivery systems.

[1]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[2]  J. Laherrere Distributions de type fractal parabolique dans la Nature , 1996 .

[3]  Thomas F. Coleman,et al.  An Interior Trust Region Approach for Nonlinear Minimization Subject to Bounds , 1993, SIAM J. Optim..

[4]  Carsten Griwodz,et al.  Long-term movie popularity models in video-on-demand systems: or the life of an on-demand movie , 1997, MULTIMEDIA '97.

[5]  D. Sornette,et al.  Stretched exponential distributions in nature and economy: “fat tails” with characteristic scales , 1998, cond-mat/9801293.

[6]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[7]  Peter Parnes,et al.  Characterizing user access to videos on the World Wide Web , 1999, Electronic Imaging.

[8]  Alec Wolman,et al.  Measurement and Analysis of a Streaming Media Workload , 2001, USITS.

[9]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[10]  Jacky C. Chu,et al.  Availability and locality measurements of peer-to-peer file systems , 2002, SPIE ITCom.

[11]  Ludmila Cherkasova,et al.  Characterizing locality, evolution, and life span of accesses in enterprise media server workloads , 2002, NOSSDAV '02.

[12]  Carey L. Williamson,et al.  On filter effects in web caching hierarchies , 2002, TOIT.

[13]  Amin Vahdat,et al.  MediSyn: a synthetic streaming media service workload generator , 2003, NOSSDAV '03.

[14]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[15]  Mary K. Vernon,et al.  Characterizing the query behavior in peer-to-peer file sharing systems , 2004, IMC '04.

[16]  Ian T. Foster,et al.  Small-world file-sharing communities , 2003, IEEE INFOCOM 2004.

[17]  B. Levine,et al.  Exploring the Use of BitTorrent as the Basis for a Large Trace Repository , 2004 .

[18]  Bruce M. Maggs,et al.  An analysis of live streaming workloads on the internet , 2004, IMC '04.

[19]  B. Levine,et al.  Availability and Popularity Measurements of Peer-to-Peer File Systems , 2004 .

[20]  W. Walls Demand stochastics, supply adaptation, and the distribution of film earnings , 2005 .

[21]  Martin Arlitt,et al.  Web Workload Characterization: Ten Years Later , 2005 .

[22]  Songqing Chen,et al.  Analysis of multimedia workloads with implications for internet streaming , 2005, WWW '05.

[23]  Jianliang Xu,et al.  Web content delivery , 2005 .

[24]  Songqing Chen,et al.  DISC: Dynamic Interleaved Segment Caching for Interactive Streaming , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[25]  Xiaoning Ding,et al.  Measurements, analysis, and modeling of BitTorrent-like systems , 2005, IMC '05.

[26]  梁武 Windows Media Services集成开发技术 , 2005 .

[27]  Ben Y. Zhao,et al.  Understanding user behavior in large-scale video-on-demand systems , 2006, EuroSys.

[28]  Songqing Chen,et al.  Delving into internet streaming media delivery: a quality and resource utilization perspective , 2006, IMC '06.

[29]  Mohamed Hefeeda,et al.  Modeling and Caching of Peer-to-Peer Traffic , 2006, Proceedings of the 2006 IEEE International Conference on Network Protocols.

[30]  Virgílio A. F. Almeida,et al.  A hierarchical characterization of a live streaming media workload , 2006, TNET.

[31]  Zongpeng Li,et al.  Youtube traffic characterization: a view from the edge , 2007, IMC '07.

[32]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[33]  Lei Guo,et al.  Insights into access patterns of internet media systems: measurements, analysis, and system design , 2008 .