YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes

YouTube relies on a massively distributed Content Delivery Network (CDN) to stream the billions of videos in its catalogue. Unfortunately, very little information about the design of such CDN is available. This, combined with the pervasiveness of YouTube, poses a big challenge for Internet Service Providers (ISPs), which are compelled to optimize end-users' Quality of Experience (QoE) while having no control on the CDN decisions.This paper presents YouLighter, an unsupervised technique to identify changes in the YouTube CDN. YouLighter leverages only passive measurements to cluster co-located identical caches into edge-nodes. This automatically unveils the structure of YouTube's CDN. Further, we propose a new metric, called Pattern Dissimilarity, that compares the clustering obtained from two different time snapshots, to pinpoint sudden changes. While several approaches allows us to compare the clustering results from the same dataset, no technique measures the similarity of clusters from different datasets. Hence, we develop a novel methodology, based on the Pattern Dissimilarity, to solve this problem.By running YouLighter over 10-month long traces obtained from ISPs, we pinpoint both sudden changes in edge-node allocation, and modifications to the cache allocation policy which actually impair the QoE that the end-users perceive.

[1]  Yang Xiang,et al.  An automatic application signature construction system for unknown traffic , 2010 .

[2]  Marco Mellia,et al.  DNS to the rescue: discerning content and services in a tangled web , 2012, IMC '12.

[3]  Yan Grunenberger,et al.  The Cost of the "S" in HTTPS , 2014, CoNEXT.

[4]  Dilip Kumar Krishnappa,et al.  DASHing YouTube: An analysis of using DASH in YouTube video service , 2013, 38th Annual IEEE Conference on Local Computer Networks.

[5]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[6]  Tobias Hoßfeld,et al.  Internet Video Delivery in YouTube: From Traffic Measurements to Quality of Experience , 2013, Data Traffic Monitoring and Analysis.

[7]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  Malik Magdon-Ismail,et al.  Measuring Similarity between Sets of Overlapping Clusters , 2010, 2010 IEEE Second International Conference on Social Computing.

[10]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[11]  Arian Bär,et al.  When YouTube Does not Work—Analysis of QoE-Relevant Degradation in Google CDN Traffic , 2014, IEEE Transactions on Network and Service Management.

[12]  Jiangchuan Liu,et al.  Statistics and Social Network of YouTube Videos , 2008, 2008 16th Interntional Workshop on Quality of Service.

[13]  Panos Kalnis,et al.  On Discovering Moving Clusters in Spatio-temporal Data , 2005, SSTD.

[14]  Richard Nelson,et al.  Application flow control in YouTube video streams , 2011, CCRV.

[15]  Arian Bär,et al.  Understanding HTTP Traffic and CDN Behavior from the Eyes of a Mobile ISP , 2014, PAM.

[16]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[17]  Marco Mellia,et al.  Dissecting Video Server Selection Strategies in the YouTube CDN , 2011, 2011 31st International Conference on Distributed Computing Systems.

[18]  Flavio Figueiredo,et al.  The tube over time: characterizing popularity growth of youtube videos , 2011, WSDM '11.

[19]  Jiawei Han,et al.  Swarm: Mining Relaxed Temporal Moving Object Clusters , 2010, Proc. VLDB Endow..

[20]  Marco Mellia,et al.  Inferring undesirable behavior from P2P traffic analysis , 2009, SIGMETRICS '09.

[21]  Zongpeng Li,et al.  Youtube traffic characterization: a view from the edge , 2007, IMC '07.

[22]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[23]  Daniel Massey,et al.  Argus: End-to-end service anomaly detection and localization from an ISP's point of view , 2012, 2012 Proceedings IEEE INFOCOM.

[24]  Zhi-Li Zhang,et al.  YouTube traffic dynamics and its interplay with a tier-1 ISP: an ISP perspective , 2010, IMC '10.

[25]  Zhi-Li Zhang,et al.  Vivisecting YouTube: An active measurement study , 2012, 2012 Proceedings IEEE INFOCOM.

[26]  Georg Carle,et al.  Traffic Anomaly Detection Using K-Means Clustering , 2007 .

[27]  Ramesh Govindan,et al.  Mapping the expansion of Google's serving infrastructure , 2013, Internet Measurement Conference.