YouLighter: A Cognitive Approach to Unveil YouTube CDN and Changes

YouTube relies on a massively distributed content delivery network (CDN) to stream the billions of videos in its catalog. Unfortunately, very little information about the design of such CDN is available. This, combined with the pervasiveness of YouTube, poses a big challenge for Internet service providers (ISPs), which are compelled to optimize end-users' quality of experience (QoE) while having almost no visibility and understanding of CDN decisions. This paper presents YouLighter, an unsupervised technique that builds upon cognitive methodologies to identify changes in how the YouTube CDN serves traffic. YouLighter leverages only passive measurements and clustering algorithms to group caches that appear colocated and identical into edge-nodes. This automatically unveils the YouTube edge-nodes used by the ISP customers. Next, we leverage a new metric, called Pattern Dissimilarity, that compares the clustering results obtained from two different time snapshots to pinpoint sudden changes. By running YouLighter over 10-month long traces obtained from two ISPs in different countries, we pinpoint both sudden changes in edge-node allocation, and small alterations to the cache allocation policies, which actually impair the QoE that the end-users perceive.

[1]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[2]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[3]  Tobias Hoßfeld,et al.  Internet Video Delivery in YouTube: From Traffic Measurements to Quality of Experience , 2013, Data Traffic Monitoring and Analysis.

[4]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[5]  Yan Grunenberger,et al.  The Cost of the "S" in HTTPS , 2014, CoNEXT.

[6]  Olivier Bonaventure,et al.  Interdomain traffic engineering with BGP , 2003, IEEE Commun. Mag..

[7]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[8]  Jiawei Han,et al.  Swarm: Mining Relaxed Temporal Moving Object Clusters , 2010, Proc. VLDB Endow..

[9]  Daniel Massey,et al.  Argus: End-to-end service anomaly detection and localization from an ISP's point of view , 2012, 2012 Proceedings IEEE INFOCOM.

[10]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[11]  Elena Baralis,et al.  YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes , 2015, 2015 27th International Teletraffic Congress.

[12]  Marco Mellia,et al.  Inferring undesirable behavior from P2P traffic analysis , 2009, SIGMETRICS '09.

[13]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[14]  Marco Mellia,et al.  Dissecting Video Server Selection Strategies in the YouTube CDN , 2011, 2011 31st International Conference on Distributed Computing Systems.

[15]  Dario Rossi,et al.  Experiences of Internet traffic monitoring with tstat , 2011, IEEE Network.

[16]  Arian Bär,et al.  On the detection of network traffic anomalies in content delivery network services , 2014, 2014 26th International Teletraffic Congress (ITC).

[17]  Arian Bär,et al.  When YouTube Does not Work—Analysis of QoE-Relevant Degradation in Google CDN Traffic , 2014, IEEE Transactions on Network and Service Management.

[18]  Arian Bär,et al.  Understanding HTTP Traffic and CDN Behavior from the Eyes of a Mobile ISP , 2014, PAM.

[19]  Georg Carle,et al.  Traffic Anomaly Detection Using K-Means Clustering , 2007 .

[20]  Panos Kalnis,et al.  On Discovering Moving Clusters in Spatio-temporal Data , 2005, SSTD.

[21]  Raimund Schatz,et al.  YouTube & Facebook Quality of Experience in mobile broadband networks , 2012, 2012 IEEE Globecom Workshops.

[22]  Zhi-Li Zhang,et al.  Vivisecting YouTube: An active measurement study , 2012, 2012 Proceedings IEEE INFOCOM.

[23]  Vyas Sekar,et al.  Shedding light on the structure of internet video quality problems in the wild , 2013, CoNEXT.

[24]  Dario Rossi,et al.  Real-Time TCP/IP Analysis with Common Hardware , 2006, 2006 IEEE International Conference on Communications.

[25]  Ramesh Govindan,et al.  Mapping the expansion of Google's serving infrastructure , 2013, Internet Measurement Conference.

[26]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[27]  Marco Mellia,et al.  DNS to the rescue: discerning content and services in a tangled web , 2012, IMC '12.

[28]  Malik Magdon-Ismail,et al.  Measuring Similarity between Sets of Overlapping Clusters , 2010, 2010 IEEE Second International Conference on Social Computing.

[29]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[30]  H. Kriegel,et al.  Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support , 2000, Data Mining and Knowledge Discovery.