Improving Cloud-Based Online Social Network Data Placement and Replication

Online social networks make it more convenient for people to find and communicate with other people based on shared interests, ideas, association with different groups, etc. Common social networks such as Facebook and Twitter have hundreds of millions or even billions of users scattered all around the world sharing interconnected data. Users demand low latency access to not only their own data but also their friends' data, often very large, e.g. videos, pictures etc. However, social network service providers have a limited monetary capital to store every piece of data everywhere to minimise users' data access latency. Geo-distributed cloud services with virtually unlimited capabilities are suitable for large scale social networks data storage in different geographical locations. Key problems including how to optimally store and replicate these huge datasets and how to distribute the requests to different datacenters are addressed in this paper. A novel genetic algorithm-based approach is used to find a near-optimal number of replicas for every user's data and a near-optimal placement of replicas to minimise monetary cost while satisfying latency requirements for all users. Experiments on a Facebook dataset demonstrate our technique's effectiveness in outperforming other representative placement and replication strategies.

[1]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[2]  Ishfaq Ahmad,et al.  Static and adaptive distributed data replication using genetic algorithms , 2004, J. Parallel Distributed Comput..

[3]  Kavitha Ranganathan,et al.  Evolving Toward the Perfect Schedule: Co-scheduling Job Assignments and Data Replication in Wide-Area Systems Using a Genetic Algorithm , 2005, JSSPP.

[4]  S. N. Sivanandam,et al.  Introduction to genetic algorithms , 2007 .

[5]  Aaron Weiss Computing in the clouds , 2007, NTWK.

[6]  Xiao Liu,et al.  A Probabilistic Strategy for Setting Temporal Constraints in Scientific Workflows , 2008, BPM.

[7]  Alec Wolman,et al.  Volley: Automated Data Placement for Geo-Distributed Cloud Services , 2010, NSDI.

[8]  Maolin Tang,et al.  A penalty-based genetic algorithm for the composite SaaS placement problem in the Cloud , 2010, IEEE Congress on Evolutionary Computation.

[9]  Xiao Liu,et al.  A probabilistic strategy for temporal constraint management in scientific workflow systems , 2011, Concurr. Comput. Pract. Exp..

[10]  Hector Garcia-Molina,et al.  Where in the world is my data? , 2011, Proc. VLDB Endow..

[11]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2010, SIGCOMM '10.

[12]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[13]  Cuong Pham,et al.  S-CLONE: Socially-aware data replication for social networks , 2012, Comput. Networks.

[14]  Bo Li,et al.  Scaling social media applications into geo-distributed clouds , 2012, 2012 Proceedings IEEE INFOCOM.

[15]  Wei Guo,et al.  A Data Placement Strategy Based on Genetic Algorithm in Cloud Computing Platform , 2013, 2013 10th Web Information System and Application Conference.

[16]  Ethan Katz-Bassett,et al.  SPANStore: cost-effective geo-replicated storage spanning multiple cloud services , 2013, SOSP.

[17]  Jun Li,et al.  Multi-objective data placement for multi-cloud socially aware services , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[18]  Zhen Ye,et al.  A Two-layer Geo-cloud based Dynamic Replica Creation Strategy , 2014 .

[19]  Douglas B. Terry,et al.  A Self-Configurable Geo-Replicated Cloud Storage System , 2014, OSDI.

[20]  Qiang Xu,et al.  A Data-Placement Strategy Based on Genetic Algorithm in Cloud Computing , 2015 .

[21]  Jun Li,et al.  Optimizing Cost for Online Social Networks on Geo-Distributed Clouds , 2016, IEEE/ACM Transactions on Networking.

[22]  Haiying Shen,et al.  Selective Data Replication for Online Social Networks with Distributed Datacenters , 2013, IEEE Transactions on Parallel and Distributed Systems.