Optimizing cost for geo-distributed storage systems in online social networks

Abstract Globally distributed data centers provide an opportunity to deploy geo-distributed Online Social Networks (OSNs). For so big data generated by users, how to store them among those data centers is a key issue in the geo-distributed storage system. Today's popular OSN providers store users’ data in each deployed data center, so as to guarantee access latency. However, the full replication manner brings relatively high storage cost and traffic cost, which extremely increases the economic expenditure of OSN providers. Data placement based on social graph partitioning is an efficient way to minimize cost, but it requires the information of entire social graph and cannot fully guarantee latency. Recently, accomplished by partitioning replication is proposed to optimize cost as well as guarantee latency, but it has two drawbacks: (1) the separated manners of optimization cannot efficiently reduce the cost; (2) the placement of master replicas and slave replicas influence each other, and eventually reduces the optimization effects. In this paper, we explore an integrated manner of optimizing partitioning and replication simultaneously without distinguishing replica's role. We propose a lightweight replica placement (LRP) scheme, which conducts optimizations in a distributed manner and is well adapted to dynamic scenarios. Evaluations with two datasets from Twitter and Facebook show that LRP significantly reduces the cost compared with state-of-the-art schemes.

[1]  Ben Y. Zhao,et al.  Beyond Social Graphs: User Interactions in Online Social Networks and their Implications , 2012, TWEB.

[2]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[3]  Junzhou Luo,et al.  Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers , 2016 .

[4]  Jun Li,et al.  Cost optimization for Online Social Networks on geo-distributed clouds , 2012, 2012 20th IEEE International Conference on Network Protocols (ICNP).

[5]  Thandar Thein,et al.  A platform for big data analytics on distributed scale-out storage system , 2015, Int. J. Big Data Intell..

[6]  Jin Wang,et al.  Towards traffic minimization for data placement in online social networks , 2017, Concurr. Comput. Pract. Exp..

[7]  Zhiting Lin,et al.  Inter-node relationships in short-range mobile social networks , 2016, Int. J. Ad Hoc Ubiquitous Comput..

[8]  Jianping Pan,et al.  Sketch-based data placement among geo-distributed datacenters for cloud storages , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[9]  Vassilis Poulopoulos,et al.  PaloPro: a platform for knowledge extraction from big social data and the news , 2017, Int. J. Big Data Intell..

[10]  Ben Y. Zhao,et al.  Exploiting locality of interest in online social networks , 2010, CoNEXT.

[11]  Jun Li,et al.  Multi-objective data placement for multi-cloud socially aware services , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[12]  Duc A. Tran,et al.  S-PUT: An EA-based framework for socially aware data partitioning , 2014, Comput. Networks.

[13]  Haiying Shen,et al.  Selective Data Replication for Online Social Networks with Distributed Datacenters , 2013, IEEE Transactions on Parallel and Distributed Systems.

[14]  H. Jonathan Chao,et al.  Intelligent virtual machine placement for cost efficiency in geo-distributed cloud systems , 2013, 2013 IEEE International Conference on Communications (ICC).

[15]  Marcos K. Aguilera,et al.  Online Migration for Geo-distributed Storage Systems , 2011, USENIX Annual Technical Conference.

[16]  Bo Li,et al.  Scaling social media applications into geo-distributed clouds , 2012, 2012 Proceedings IEEE INFOCOM.

[17]  Indranil Gupta,et al.  Disk Layout Techniques for Online Social Network Data , 2012, IEEE Internet Computing.

[18]  Junsong Yuan,et al.  Optimizing Inter-server Communication for Online Social Networks , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[19]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.

[20]  Minghua Chen,et al.  Moving Big Data to The Cloud: An Online Cost-Minimizing Approach , 2013, IEEE Journal on Selected Areas in Communications.

[21]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2012, TNET.

[22]  Amir Masoud Rahmani,et al.  A Two-Level Fuzzy Value-Based Replica Replacement Algorithm in Data Grids , 2016, Int. J. Grid High Perform. Comput..

[23]  Sem C. Borst,et al.  Distributed Caching Algorithms for Content Distribution Networks , 2010, 2010 Proceedings IEEE INFOCOM.

[24]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[25]  Jianping Pan,et al.  Location-aware associated data placement for geo-distributed data-intensive applications , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[26]  Cecilia Mascolo,et al.  Track globally, deliver locally: improving content delivery networks by tracking geographic social cascades , 2011, WWW.

[27]  Bing Zhang,et al.  DLS: a cloud-hosted data caching and prefetching service for distributed metadata access , 2015, Int. J. Big Data Intell..

[28]  Cuong Pham,et al.  S-CLONE: Socially-aware data replication for social networks , 2012, Comput. Networks.

[29]  Hai Jin,et al.  Minimizing inter-server communications by exploiting self-similarity in online social networks , 2012, ICNP.

[30]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..