Data placement in distributed data centers for improved SLA and network cost

Abstract Large-scale data-intensive applications provide services to users by routing service requests to geographically distributed data centers interconnected by Internet links. In order to achieve good reliability and data access latency performance, cloud service providers often simultaneously place multiple copies of the data in different data centers. The network communication required for updating the multiple data copies incurs an operational cost. At the same time, the penalty incurred by the Service Level Agreement (SLA) violation for data access from the data centers also imposes an operational cost on the service providers. In this paper, we tackle the problem of data placement in distributed data centers with the aim to minimize the operational cost incurred by delay SLA violation penalty and inter-data center network communication, assuming each data has K data replicas. We propose a K-level Cluster-based Data Placement algorithm (K-CDP) for the problem. The algorithm solves the linear programming relaxation and dual programming problems corresponding to the problem of minimizing SLA violation penalty cost caused by placing a replica of each data in a data center. Based on the obtained solutions, the algorithm clusters the data so that the data with similar placeable data centers form a data cluster. For the data in each cluster, the algorithm selects K data centers to minimize the operational cost. We prove that algorithm K-CDP is 2-approximation to the data placement problem. Our simulation results demonstrate that the proposed algorithm can effectively reduce the penalty cost incurred by delay SLA violation, the network communication cost, and the operational cost of data centers.

[1]  Weifa Liang,et al.  Operational cost minimization of distributed data centers through the provision of fair request rate allocations while meeting different user SLAs , 2015, Comput. Networks.

[2]  Jun Li,et al.  Cost optimization for Online Social Networks on geo-distributed clouds , 2012, 2012 20th IEEE International Conference on Network Protocols (ICNP).

[3]  Jianping Pan,et al.  Location-aware associated data placement for geo-distributed data-intensive applications , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[4]  Ratul Mahajan,et al.  A provider-side view of web search response time , 2013, SIGCOMM.

[5]  Lusheng Wang,et al.  Green latency-aware data placement in data centers , 2016, Comput. Networks.

[6]  Jun Li,et al.  Optimizing Cost for Online Social Networks on Geo-Distributed Clouds , 2016, IEEE/ACM Transactions on Networking.

[7]  Rodney S. Tucker,et al.  Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport , 2011, Proceedings of the IEEE.

[8]  Mendel Rosenblum,et al.  It's Time for Low Latency , 2011, HotOS.

[9]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[10]  Weifa Liang,et al.  The operational cost minimization in distributed clouds via community-aware user data placements of social networks , 2017, Comput. Networks.

[11]  John V. Guttag,et al.  Power-demand routing in massive geo-distributed systems , 2010 .

[12]  Weifa Liang,et al.  Efficient Embedding of Virtual Networks to Distributed Clouds via Exploring Periodic Resource Demands , 2018, IEEE Transactions on Cloud Computing.

[13]  George Varghese,et al.  Fine-grained latency and loss measurements in the presence of reordering , 2011, SIGMETRICS.

[14]  Hai Jin,et al.  Carbon-Aware Online Control of Geo-Distributed Cloud Services , 2016, IEEE Transactions on Parallel and Distributed Systems.

[15]  Jun Li,et al.  Multi-objective data placement for multi-cloud socially aware services , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[16]  Sandy Irani,et al.  The Subset Assignment Problem for Data Placement in Caches , 2016, Algorithmica.

[17]  Zhe Wu,et al.  Understanding the latency benefits of multi-cloud webservice deployments , 2013, CCRV.