Analyzing the Video Popularity Characteristics of Large-Scale User Generated Content Systems

User generated content (UGC), now with millions of video producers and consumers, is reshaping the way people watch video and TV. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and generating new business opportunities. Compared to traditional video-on-demand (VoD) systems, UGC services allow users to request videos from a potentially unlimited selection in an asynchronous fashion. To better understand the impact of UGC services, we have analyzed the world's largest UGC VoD system, YouTube, and a popular similar system in Korea, Daum Videos. In this paper, we first empirically show how UGC services are fundamentally different from traditional VoD services. We then analyze the intrinsic statistical properties of UGC popularity distributions and discuss opportunities to leverage the latent demand for niche videos (or the so-called "the Long Tail" potential), which is not reached today due to information filtering or other system scarcity distortions. Based on traces collected across multiple days, we study the popularity lifetime of UGC videos and the relationship between requests and video age. Finally, we measure the level of content aliasing and illegal content in the system and show the problems aliasing creates in ranking the video popularity accurately. The results presented in this paper are crucial to understanding UGC VoD systems and may have major commercial and technical implications for site administrators and content owners.

[1]  Priscilla S. Markwood,et al.  The Long Tail: Why the Future of Business is Selling Less of More , 2006 .

[2]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[3]  Zongpeng Li,et al.  Youtube traffic characterization: a view from the edge , 2007, IMC '07.

[4]  Cheng Huang,et al.  Can internet video-on-demand be profitable? , 2007, SIGCOMM '07.

[5]  A Vespignani,et al.  Topical interests and the mitigation of search engine bias , 2006, Proceedings of the National Academy of Sciences.

[6]  Ben Y. Zhao,et al.  Understanding user behavior in large-scale video-on-demand systems , 2006, EuroSys.

[7]  Walter Willinger,et al.  Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications , 2005, Internet Math..

[8]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[9]  Ludmila Cherkasova,et al.  Analysis of enterprise media server workloads: access patterns, locality, content evolution, and rates of change , 2004, IEEE/ACM Transactions on Networking.

[10]  Ítalo S. Cunha,et al.  Analyzing client interactivity in streaming media , 2004, WWW '04.

[11]  Junghoo Cho,et al.  Impact of search engines on page popularity , 2004, WWW '04.

[12]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[13]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[14]  Mark Levene,et al.  A stochastic evolutionary model exhibiting power-law behaviour with an exponential cutoff , 2002, cond-mat/0209463.

[15]  Luís A. Nunes Amaral,et al.  Truncation of power law behavior in "scale-free" network models due to information filtering. , 2002, Physical review letters.

[16]  Ludmila Cherkasova,et al.  Analysis of Enterprise Media Server Workloads : Access Patterns , Locality , Dynamics , and Rate of Change , 2002 .

[17]  Vishal Misra,et al.  On the tails of web file size distributions , 2001 .

[18]  Allen B. Downey,et al.  The structural cause of file size distributions , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[19]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[20]  L. Amaral,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[22]  Carsten Griwodz,et al.  Long-term movie popularity models in video-on-demand systems: or the life of an on-demand movie , 1997, MULTIMEDIA '97.

[23]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[24]  H. A. Simon,et al.  Skew Distributions and the Size of Business Firms , 1977 .

[25]  G. Yule,et al.  A Mathematical Theory of Evolution Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[26]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .