TailGate: handling long-tail content with a little help from friends

Distributing long-tail content is an inherently difficult task due to the low amortization of bandwidth transfer costs as such content has limited number of views. Two recent trends are making this problem harder. First, the increasing popularity of user-generated content (UGC) and online social networks (OSNs) create and reinforce such popularity distributions. Second, the recent trend of geo-replicating content across multiple PoPs spread around the world, done for improving quality of experience (QoE) for users and for redundancy reasons, can lead to unnecessary bandwidth costs. We build TailGate, a system that exploits social relationships, regularities in read access patterns, and time-zone differences to efficiently and selectively distribute long-tail content across PoPs. We evaluate TailGate using large traces from an OSN and show that it can decrease WAN bandwidth costs by as much as 80% as well as reduce latency, improving QoE. We deploy TailGate on PlanetLab and show that even in the case when imprecise social information is available, TailGate can still decrease the latency for accessing long-tail YouTube videos by a factor of 2.

[1]  Keith W. Ross,et al.  Measuring and Evaluating Large-Scale CDNs , 2008 .

[2]  R. Sinnott Virtues of the Haversine , 1984 .

[3]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2010, SIGCOMM '10.

[4]  Jasmine Novak,et al.  Geographic routing in social networks , 2005, Proc. Natl. Acad. Sci. USA.

[5]  Pablo Rodriguez,et al.  Explore what-if scenarios with SONG: Social Network Write Generator , 2011, ArXiv.

[6]  Jon Crowcroft,et al.  Buzztraq: predicting geographical access patterns of social cascades using social networks , 2009, SNS '09.

[7]  Alec Wolman,et al.  Volley: Automated Data Placement for Geo-Distributed Cloud Services , 2010, NSDI.

[8]  Jin Li,et al.  Measuring and evaluating large-scale CDNs Paper withdrawn at Mirosoft's request , 2008, IMC '08.

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  Virgílio A. F. Almeida,et al.  Characterizing user behavior in online social networks , 2009, IMC '09.

[11]  Anja Feldmann,et al.  Revisiting Cacheability in Times of User Generated Content , 2010, 2010 INFOCOM IEEE Conference on Computer Communications Workshops.

[12]  Marco Mellia,et al.  Dissecting Video Server Selection Strategies in the YouTube CDN , 2011, 2011 31st International Conference on Distributed Computing Systems.

[13]  Bin Li,et al.  Content Availability and Bundling in Swarming Systems , 2009, IEEE/ACM Transactions on Networking.

[14]  Robin I. M. Dunbar Neocortex size as a constraint on group size in primates , 1992 .

[15]  Cecilia Mascolo,et al.  Track globally, deliver locally: improving content delivery networks by tracking geographic social cascades , 2011, WWW.

[16]  Bernardo A. Huberman,et al.  Rhythms of social interaction: messaging within a massive online network , 2006, ArXiv.

[17]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[18]  Pablo Rodriguez,et al.  Divide and Conquer: Partitioning Online Social Networks , 2009, ArXiv.

[19]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[20]  Antony I. T. Rowstron,et al.  Hermes: clustering users in large-scale e-mail services , 2010, SoCC '10.

[21]  Ben Y. Zhao,et al.  Exploiting locality of interest in online social networks , 2010, CoNEXT.

[22]  Anja Feldmann,et al.  Understanding online social network usage from a network perspective , 2009, IMC '09.

[23]  Emin Gün Sirer,et al.  AntFarm: Efficient Content Distribution with Managed Swarms , 2009, NSDI.

[24]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[25]  Michael Sirivianos,et al.  Inter-datacenter bulk transfers with netstitcher , 2011, SIGCOMM.