Workload generation for YouTube

This paper introduces a workload characterization study of the most popular short video sharing service of Web 2.0, YouTube. Based on a vast amount of data gathered in a five-month period, we analyzed characteristics of around 250,000 YouTube popular and regular videos. In particular, we collected lists of related videos for each video clip recursively and analyzed their statistical behavior. Understanding YouTube traffic and similar Web 2.0 video sharing sites is crucial to develop synthetic workload generators. Workload simulators are required for evaluating the methods addressing the problems of high bandwidth usage and scalability of Web 2.0 sites such as YouTube. The distribution models, in particular Zipf-like behavior of YouTube popular video files suggests proxy caching of YouTube popular videos can reduce network traffic and increase scalability of YouTube Web site. YouTube workload characteristics provided in this work enabled us to develop a workload generator to evaluate the effectiveness of this approach.