Profit-based file replication in data intensive cloud data centers

Many of the applications running in cloud data center are data intensive, processing large amount of data inside the data center. File replication, which brings data files closer to the computing virtual machines (VMs), is an effective strategy that reduces data access latencies and bandwidth consumption, thus saving energy in data centers. In this paper, we formulate and study the file replication problem (FRP) in data center, with the goal of minimizing the total energy consumption of data file access inside data centers. In contrast to all the existing work of data replication in data centers, which are mainly heuristic based, we design a time-efficient approximation algorithm with performance guarantee for energy consumption in file replication. In particular, our file replication algorithm is based on a novel concept called “profit”, and optimizes over a submodular function that can be computed efficiently. Our algorithm yields the total profit of file replication at least half of what is achieved by an optimal replication solution. We also design two energy- and time-efficient heuristic file replication algorithms. Via extensive simulations using CloudSim, a popular simulation framework for cloud computing, we compare all the algorithms under different network scenarios. We show that the approximation algorithm outperforms the other two under different network parameters, while all three effectively reducing the total energy consumptions of data access in data centers.

[1]  Xiaowen Dong,et al.  Green IP Over WDM Networks With Data Centers , 2011, Journal of Lightwave Technology.

[2]  Albert Y. Zomaya,et al.  Energy-efficient data replication in cloud computing datacenters , 2013, GLOBECOM Workshops.

[3]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[4]  Vasileios Pappas,et al.  Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement , 2010, 2010 Proceedings IEEE INFOCOM.

[5]  R. Shriram,et al.  Power management in virtualized datacenter - A survey , 2016, J. Netw. Comput. Appl..

[6]  Jeong-Hyon Hwang,et al.  Towards Optimal Data Replication Across Data Centers , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[7]  Yun Yang,et al.  A Novel Cost-Effective Dynamic Data Replication Strategy for Reliability in Cloud Data Centres , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[8]  Haibing Guan,et al.  A survey on data center networking for cloud computing , 2015, Comput. Networks.

[9]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[10]  Albert Y. Zomaya,et al.  Models for efficient data replication in cloud computing datacenters , 2015, 2015 IEEE International Conference on Communications (ICC).

[11]  Nicholas J. Macias,et al.  Energy efficiency of Zipf traffic distributions within Facebook's data center fabric architecture , 2015, 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[12]  Magnus Karlsson,et al.  Choosing replica placement heuristics for wide-area systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[13]  Lili Qiu,et al.  On the placement of Web server replicas , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[14]  Shanshan Li,et al.  eStor: Energy efficient and resilient data center storage , 2011, 2011 International Conference on Cloud and Service Computing.

[15]  Bin Tang,et al.  Data Replication in Data Intensive Scientific Applications with Performance Guarantee , 2011, IEEE Transactions on Parallel and Distributed Systems.

[16]  Prashant J. Shenoy,et al.  Blink: managing server clusters on intermittent power , 2011, ASPLOS XVI.