Statistics and Social Network of YouTube Videos

YouTube has become the most successful Internet website providing a new generation of short video sharing service since its establishment in early 2005. YouTube has a great impact on Internet traffic nowadays, yet itself is suffering from a severe problem of scalability. Therefore, understanding the characteristics of YouTube and similar sites is essential to network traffic engineering and to their sustainable development. To this end, we have crawled the YouTube site for four months, collecting more than 3 million YouTube videos' data. In this paper, we present a systematic and in-depth measurement study on the statistics of YouTube videos. We have found that YouTube videos have noticeably different statistics compared to traditional streaming videos, ranging from length and access pattern, to their growth trend and active life span. We investigate the social networking in YouTube videos, as this is a key driving force toward its success. In particular, we find that the links to related videos generated by uploaders' choices have clear small-world characteristics. This indicates that the videos have strong correlations with each other, and creates opportunities for developing novel techniques to enhance the service quality.

[1]  Martin Nilsson Jacobi,et al.  Hierarchical Organization in , 2005 .

[2]  Mary K. Vernon,et al.  Analysis of educational media server workloads , 2001, NOSSDAV '01.

[3]  Donald F. Towsley,et al.  Proxy prefix caching for multimedia streams , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[4]  Jiangchuan Liu,et al.  Proxy caching for media streaming over the Internet , 2004, IEEE Communications Magazine.

[5]  Albert-László Barabási,et al.  Hierarchical organization in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[7]  John C. Paolillo Structure and Network in the YouTube Core , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[8]  Yu Gu,et al.  Watch global, cache local: YouTube network traffic at a campus network: measurements and implications , 2008, Electronic Imaging.

[9]  Peter Parnes,et al.  Characterizing user access to videos on the World Wide Web , 1999, Electronic Imaging.

[10]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[11]  Martin Halvey,et al.  Exploring social dynamics in online media sharing , 2007, WWW '07.

[12]  Andy Oram,et al.  Peer-to-Peer: Harnessing the Power of Disruptive Technologies , 2001 .

[13]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[14]  Bo Li,et al.  CoolStreaming/DONet: a data-driven overlay network for peer-to-peer live media streaming , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[15]  Zongpeng Li,et al.  Youtube traffic characterization: a view from the edge , 2007, IMC '07.

[16]  Bo Li,et al.  Opportunities and Challenges of Peer-to-Peer Internet Video Broadcast , 2008, Proceedings of the IEEE.

[17]  Cheng Huang,et al.  Can internet video-on-demand be profitable? , 2007, SIGCOMM '07.

[18]  Ben Y. Zhao,et al.  Understanding user behavior in large-scale video-on-demand systems , 2006, EuroSys.

[19]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[20]  Gang Liu,et al.  Measurement and Modeling of Large-Scale Peer-to-Peer Storage System , 2004, GCC Workshops.

[21]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[22]  Amin Vahdat,et al.  Long-term Streaming Media Server Workload Analysis and Modeling , 2003 .