Analyzing the Video Popularity Characteristics of Large-Scale User Generated Content Systems

User generated content (UGC), now with millions of video producers and consumers, is reshaping the way people watch video and TV. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and generating new business opportunities. Compared to traditional video-on-demand (VoD) systems, UGC services allow users to request videos from a potentially unlimited selection in an asynchronous fashion. To better understand the impact of UGC services, we have analyzed the world's largest UGC VoD system, YouTube, and a popular similar system in Korea, Daum Videos. In this paper, we first empirically show how UGC services are fundamentally different from traditional VoD services. We then analyze the intrinsic statistical properties of UGC popularity distributions and discuss opportunities to leverage the latent demand for niche videos (or the so-called "the Long Tail" potential), which is not reached today due to information filtering or other system scarcity distortions. Based on traces collected across multiple days, we study the popularity lifetime of UGC videos and the relationship between requests and video age. Finally, we measure the level of content aliasing and illegal content in the system and show the problems aliasing creates in ranking the video popularity accurately. The results presented in this paper are crucial to understanding UGC VoD systems and may have major commercial and technical implications for site administrators and content owners.

[1]  Stefano Mossa,et al.  Truncation of power law behavior in "scale-free" network models due to information filtering. , 2002, Physical review letters.

[2]  Junghoo Cho,et al.  Impact of search engines on page popularity , 2004, WWW '04.

[3]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[4]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[5]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[6]  Ludmila Cherkasova,et al.  Analysis of enterprise media server workloads: access patterns, locality, content evolution, and rates of change , 2004, IEEE/ACM Transactions on Networking.

[7]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[9]  Vishal Misra,et al.  On the tails of web file size distributions , 2001 .

[10]  Mark Levene,et al.  A stochastic evolutionary model exhibiting power-law behaviour with an exponential cutoff , 2002, cond-mat/0209463.

[11]  Ludmila Cherkasova,et al.  Analysis of Enterprise Media Server Workloads : Access Patterns , Locality , Dynamics , and Rate of Change , 2002 .

[12]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[13]  A Vespignani,et al.  Topical interests and the mitigation of search engine bias , 2006, Proceedings of the National Academy of Sciences.

[14]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[15]  Ítalo S. Cunha,et al.  Analyzing client interactivity in streaming media , 2004, WWW '04.

[16]  Ben Y. Zhao,et al.  Understanding user behavior in large-scale video-on-demand systems , 2006, EuroSys.

[17]  H. A. Simon,et al.  Skew Distributions and the Size of Business Firms , 1977 .

[18]  Walter Willinger,et al.  Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications , 2005, Internet Math..

[19]  Allen B. Downey The structural cause of file size distributions , 2001, SIGMETRICS '01.

[20]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[21]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[22]  Priscilla S. Markwood,et al.  The Long Tail: Why the Future of Business is Selling Less of More , 2006 .

[23]  Carsten Griwodz,et al.  Long-term movie popularity models in video-on-demand systems: or the life of an on-demand movie , 1997, MULTIMEDIA '97.

[24]  Zongpeng Li,et al.  Youtube traffic characterization: a view from the edge , 2007, IMC '07.

[25]  Cheng Huang,et al.  Can internet video-on-demand be profitable? , 2007, SIGCOMM '07.