Who to blame when YouTube is not working? detecting anomalies in CDN-provisioned services

Internet-scale services like YouTube are provisioned by large Content Delivery Networks (CDNs), which push content as close as possible to the end-users to improve their Quality of Experience (QoE) and to pursue their own optimization goals. Adopting space and time variant traffic delivery policies, CDNs serve users' requests from multiple servers/caches at different physical locations and different times. CDNs traffic distribution policies can have a relevant impact on the traffic routed through the Internet Service Provider (ISP), as well as unexpected negative effects on the end-user QoE. In the event of poor QoE due to faulty CDN server selection, a major problem for the ISP is to avoid being blamed by its customers. In this paper we show a real case study in which Google CDN server selection policies negatively impact the QoE of the customers of a major European ISP watching YouTube. We argue that it is extremely important for the ISP to rapidly and automatically detect such events to increase its visibility on the overall operation of the network, as well as to promptly answer possible customer complaints. We therefore present an Anomaly Detection (AD) system for detecting unexpected cache-selection changes in the traffic delivered by CDNs. The proposed algorithm improves over traditional AD approaches by analyzing the complete probability distribution of the monitored features, as well as by self-adapting its functioning to dynamic environments, providing better detection capabilities.

[1]  Farnam Jahanian,et al.  Internet inter-domain traffic , 2010, SIGCOMM '10.

[2]  Ramesh Govindan,et al.  Mapping the expansion of Google's serving infrastructure , 2013, Internet Measurement Conference.

[3]  Ramesh K. Sitaraman,et al.  The Akamai network: a platform for high-performance internet applications , 2010, OPSR.

[4]  Marina Thottan,et al.  Anomaly Detection Approaches for Communication Networks , 2010, Algorithms for Next Generation Networks.

[5]  Robert Doverspike,et al.  Traffic types and growth in backbone networks , 2011, 2011 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference.

[6]  Marco Mellia,et al.  Uncovering the Big Players of the Web , 2012, TMA.

[7]  Raimund Schatz,et al.  YouTube & Facebook Quality of Experience in mobile broadband networks , 2012, 2012 IEEE Globecom Workshops.

[8]  Pedro Casas,et al.  Optimal volume anomaly detection and isolation in large-scale IP networks using coarse-grained measurements , 2010, Comput. Networks.

[9]  Lukasz Golab,et al.  DBStream: An online aggregation, filtering and processing system for network traffic monitoring , 2014, 2014 International Wireless Communications and Mobile Computing Conference (IWCMC).

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Angelo Coluccia,et al.  Distribution-based anomaly detection in 3G mobile networks: from theory to practice , 2010, Int. J. Netw. Manag..

[12]  Jie Gao,et al.  Moving beyond end-to-end path information to optimize CDN performance , 2009, IMC '09.

[13]  Dario Rossi,et al.  Experiences of Internet traffic monitoring with tstat , 2011, IEEE Network.