Characterization of user access to streaming media files

1. INTRODUCTION This abstract presents an overview of the log data analysis for two real-time streaming media servers that deliver course content at major public universities in the United States. Both servers contain higher quality videos than servers previously analyzed [1,5,6]. The eTeach server [7] delivers the content for a computer science undergraduate course with art enrollment of 280 students. There are no live lectures for that course; instead, the students request and play the video lectures and laboratory demos that are archived on the server. The BIBS (Berkeley Internet Broadcasting System) server [8] serves lectures for several courses across a number of different fields. The lectures on the BIBS server are first available in a classroom. We analyze the server logs with three goals: providing data for generating synthetic workloads, gaining insights into caching strategies and quantifying the effectiveness of recently developed multieast streaming methods (e.g., [4]) in an interactive educational server environment. Our key new results are: 1. For periods of relatively stable client request arrival rate, the client session arrival process on the BIBS server is approximately Poisson, while the time between interactive requests in the eTeach server follows a heavy-tailed distribution. 2. The distribution of file popularity can be modeled by the concatenation of two Zipf-like distributions. 3. On each server, a significant fraction of the files, or file segments, that are requested in a given hour but not in the previous hour are not requested again for several hours, which motivates the need to reevaluate the traditional cache-on-first-miss strategy for streaming media files. 4. High client interactivity is observed in eTeach, leading to large numbers of partial video accesses. However, for the most popular files, all ten-second segments of the media are accessed nearly equally often. Permission to make digital or hard copies of all or pert of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 5. Despite the high client interactivity, server bandwidth for the three most frequently accessed files in each period examined on eTeach can be reduced by 40-60% by using multicast delivery. The eTeach videos are recorded in Windows Media …

[1]  Jitendra Padhye,et al.  Continuous-Media Courseware Server: A Study of Client Interactions , 1999, IEEE Internet Comput..

[2]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[3]  Peter Parnes,et al.  Characterizing user access to videos on the World Wide Web , 1999, Electronic Imaging.

[4]  Gregory D. Abowd,et al.  Workload of a Media-Enhanced Classroom Server , 2000 .

[5]  Mary K. Vernon,et al.  Minimizing Bandwidth Requirements for On-Demand Data Delivery , 2001, IEEE Trans. Knowl. Data Eng..