Social-Aware Replication in Geo-Diverse Online Systems

Distributing long-tail content is a difficult task due to the low amortization of bandwidth transfer costs as such content has limited number of views. Two recent trends are making this problem harder. First, the increasing popularity of user-generated content and online social networks create and reinforce such popularity distributions. Second, the recent trend of geo-replicating content across multiple points of presence spread around the world, done for improving quality of experience (QoE) for users. In this paper, we analyze and explore the tradeoff involving the “freshness” of the information available to the users and WAN bandwidth costs, and we propose ways to reduce the latter through smart update propagation scheduling, by leveraging on the knowledge of the mapping between social relationships and geographic location, the timing regularities and time differences in end user activity. We first assess the potential of our approach by implementing a simple social-aware scheduling algorithm that operates under bandwidth budget constraints and by quantifying its benefits through a trace-driven analysis. We show that it can reduce WAN traffic by up to 55 percent compared to an immediate update of all replicas, with a minimal effect on information freshness and latency. Second, we build TailGate, a practical system that implements our social-aware scheduling approach, which distributes on the fly long-tail content across PoPs at reduced bandwidth costs by flattening the traffic. We evaluate TailGate by using traces from an OSN and show that it can decrease WAN bandwidth costs by as much as 80 percent and improve QoE. We deploy TailGate on PlanetLab and show that even in the case when imprecise social information is available, it can still decrease by a factor of 2 the latency for accessing long-tail YouTube videos.

[1]  Cecilia Mascolo,et al.  Track globally, deliver locally: improving content delivery networks by tracking geographic social cascades , 2011, WWW.

[2]  M. Hitt The Long Tail: Why the Future of Business Is Selling Less of More , 2007 .

[3]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[4]  Pablo Rodriguez,et al.  Explore what-if scenarios with SONG: Social Network Write Generator , 2011, ArXiv.

[5]  Emin Gün Sirer,et al.  AntFarm: Efficient Content Distribution with Managed Swarms , 2009, NSDI.

[6]  Keith W. Ross,et al.  Measuring and Evaluating Large-Scale CDNs , 2008 .

[7]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[8]  Bin Li,et al.  Content Availability and Bundling in Swarming Systems , 2009, IEEE/ACM Transactions on Networking.

[9]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[10]  C. Moallemi,et al.  The Cost of Latency ∗ , 2009 .

[11]  Nishanth R. Sastry How To Tell Head From Tail in User-Generated Content Corpora , 2012, ICWSM.

[12]  R. Sinnott Virtues of the Haversine , 1984 .

[13]  Priscilla S. Markwood,et al.  The Long Tail: Why the Future of Business is Selling Less of More , 2006 .

[14]  Vijay Erramilli,et al.  TailGate: handling long-tail content with a little help from friends , 2012, WWW.

[15]  Robin I. M. Dunbar Neocortex size as a constraint on group size in primates , 1992 .

[16]  Alec Wolman,et al.  Volley: Automated Data Placement for Geo-Distributed Cloud Services , 2010, NSDI.

[17]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[18]  Virgílio A. F. Almeida,et al.  Characterizing user behavior in online social networks , 2009, IMC '09.

[19]  Ben Y. Zhao,et al.  Exploiting locality of interest in online social networks , 2010, CoNEXT.

[20]  Anja Feldmann,et al.  Understanding online social network usage from a network perspective , 2009, IMC '09.

[21]  Bernardo A. Huberman,et al.  Rhythms of social interaction: messaging within a massive online network , 2006, ArXiv.

[22]  Lifeng Sun,et al.  Guiding internet-scale video service deployment using microblog-based prediction , 2012, 2012 Proceedings IEEE INFOCOM.

[23]  Antony I. T. Rowstron,et al.  Hermes: clustering users in large-scale e-mail services , 2010, SoCC '10.

[24]  Jasmine Novak,et al.  Geographic routing in social networks , 2005, Proc. Natl. Acad. Sci. USA.

[25]  Jon Crowcroft,et al.  Buzztraq: predicting geographical access patterns of social cascades using social networks , 2009, SNS '09.

[26]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2010, SIGCOMM '10.

[27]  Raghu Ramakrishnan,et al.  Feeding frenzy: selectively materializing users' event feeds , 2010, SIGMOD Conference.

[28]  Xenofontas A. Dimitropoulos,et al.  On the 95-Percentile Billing Method , 2009, PAM.

[29]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[30]  Anja Feldmann,et al.  Revisiting Cacheability in Times of User Generated Content , 2010, 2010 INFOCOM IEEE Conference on Computer Communications Workshops.

[31]  Marco Mellia,et al.  Dissecting Video Server Selection Strategies in the YouTube CDN , 2011, 2011 31st International Conference on Distributed Computing Systems.

[32]  Michael Sirivianos,et al.  Inter-datacenter bulk transfers with netstitcher , 2011, SIGCOMM.

[33]  Hawoong Jeong,et al.  Comparison of online social relations in volume vs interaction: a case study of cyworld , 2008, IMC '08.

[34]  Xiaoyuan Yang,et al.  Inter-datacenter bulk transfers with netstitcher , 2011 .