Scalable and Adaptive Data Replica Placement for Geo-Distributed Cloud Storages

In geo-distributed cloud storage systems, data replication has been widely used to serve the ever more users around the world for high data reliability and availability. How to optimize the data replica placement has become one of the fundamental problems to reduce the inter-node traffic and the system overhead of accessing associated data items. In the big data era, traditional solutions may face the challenges of long running time and large overheads to handle the increasing scale of data items with time-varying user requests. Therefore, novel offline community discovery and online community adjustment schemes are proposed to solve the replica placement problem in a scalable and adaptive way. The offline scheme can find a replica placement solution based on the average read/write rates for a certain period of time. The scalability can be achieved as 1) the computation complexity is linear to the amount of data items and 2) the data-node communities can evolve in parallel for a distributed replica placement. Furthermore, the online scheme is adaptive to handle the bursty data requests, without the need to completely override the existing replica placement. Driven by real-world data traces, extensive performance evaluations demonstrate the effectiveness of our design to handle large-scale datasets.

[1]  Nam P. Nguyen,et al.  Adaptive algorithms for detecting community structure in dynamic social networks , 2011, 2011 Proceedings IEEE INFOCOM.

[2]  Yitzhak Birk,et al.  Replicate and Bundle (RnB) -- A Mechanism for Relieving Bottlenecks in Data Centers , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[3]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[4]  Jianxi Fan,et al.  JPR: Exploring Joint Partitioning and Replication for Traffic Minimization in Online Social Networks , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[5]  Jianping Pan,et al.  Sketch-based data placement among geo-distributed datacenters for cloud storages , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[6]  Guosun Zeng,et al.  Placing big graph into cloud for parallel processing with a two-phase community-aware approach , 2019, Future Gener. Comput. Syst..

[7]  T. Neumann Computers And Intractability A Guide To The Theory Of Np Completeness , 2016 .

[8]  Haiying Shen,et al.  Selective Data replication for Online Social Networks with Distributed Datacenters , 2013, 2013 21st IEEE International Conference on Network Protocols (ICNP).

[9]  Jianping Pan,et al.  A Learning-Based Data Placement Framework for Low Latency in Data Center Networks , 2022, IEEE Transactions on Cloud Computing.

[10]  Hao Long,et al.  Overlapping community detection with least replicas in complex networks , 2018, Inf. Sci..

[11]  Cuong Pham,et al.  S-CLONE: Socially-aware data replication for social networks , 2012, Comput. Networks.

[12]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2010, SIGCOMM '10.

[13]  Hai Jin,et al.  Minimizing Inter-Server Communications by Exploiting Self-Similarity in Online Social Networks , 2012, IEEE Transactions on Parallel and Distributed Systems.

[14]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15]  Alec Wolman,et al.  Volley: Automated Data Placement for Geo-Distributed Cloud Services , 2010, NSDI.

[16]  Bruno Volckaert,et al.  SpeCH: A scalable framework for data placement of data-intensive services in geo-distributed clouds , 2019, J. Netw. Comput. Appl..

[17]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[18]  Rajkumar Buyya,et al.  Data Storage Management in Cloud Environments , 2017, ACM Comput. Surv..

[19]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[20]  Divyakant Agrawal,et al.  Global-Scale Placement of Transactional Data Stores , 2018, EDBT.

[21]  Adam Wierman,et al.  Datum: Managing Data Purchasing and Data Placement in a Geo-Distributed Data Market , 2018, IEEE/ACM Transactions on Networking.

[22]  Vijay Erramilli,et al.  TailGate: handling long-tail content with a little help from friends , 2012, WWW.

[23]  Murat Demirbas,et al.  Adapting to Access Locality via Live Data Migration in Globally Distributed Datastores , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[24]  Tony Savor,et al.  Sharding the Shards: Managing Datastore Locality at Scale with Akkio , 2018, OSDI.

[25]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[26]  Jun Li,et al.  Multi-objective data placement for multi-cloud socially aware services , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[27]  Antony I. T. Rowstron,et al.  Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[28]  Carlo Ratti,et al.  A General Optimization Technique for High Quality Community Detection in Complex Networks , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Marcus B. Perry,et al.  The Exponentially Weighted Moving Average , 2010 .

[30]  Paramvir Bahl,et al.  Low Latency Geo-distributed Data Analytics , 2015, SIGCOMM.

[31]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[32]  Jianping Pan,et al.  A Framework of Hypergraph-Based Data Placement Among Geo-Distributed Datacenters , 2020, IEEE Transactions on Services Computing.

[33]  Jianping Pan,et al.  Learning-based Adaptive Data Placement for Low Latency in Data Center Networks , 2018, 2018 IEEE 43rd Conference on Local Computer Networks (LCN).