Towards Optimal Data Replication Across Data Centers

There has been proliferation of data centers that provide both computation and storage resources at diverse geographic locations. In a variety of wide area applications, data can be replicated to serve users with lower latency. This paper presents a technique that can effectively reduce the overall data access delay through gradual migration of data replicas. In contrast to previous solutions that either randomly select replica locations or process a large log of past data accesses, this new technique maintains only a small, decentralized summary of recent data accesses while achieving near optimal performance. This paper also includes an evaluation study that substantiates the effectiveness of the developed technique, and plans for extending the current research outcomes.

[1]  Robert Tappan Morris,et al.  Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM '04.

[2]  John A. Chandy A generalized replica placement strategy to optimize latency in a wide area distributed storage system , 2008, DADC '08.

[3]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[4]  Guillaume Pierre,et al.  Autonomic Data Placement Strategies for Update-intensiveWeb applications , 2005, First International Workshop on Advanced Architectures and Algorithms for Internet Delivery and Applications (AAA-IDEA'05).

[5]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[6]  Randy H. Katz,et al.  Dynamic Replica Placement for Scalable Content Delivery , 2002, IPTPS.

[7]  Alec Wolman,et al.  Volley: Automated Data Placement for Geo-Distributed Cloud Services , 2010, NSDI.

[8]  Michal Szymaniak,et al.  Latency-Driven Replica Placement , 2006 .

[9]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[10]  Yannis Manolopoulos,et al.  A latency-based object placement approach in content distribution networks , 2005, Third Latin American Web Congress (LA-WEB'2005).

[11]  Lili Qiu,et al.  On the placement of Web server replicas , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[12]  Sujata Banerjee,et al.  Measuring Bandwidth Between PlanetLab Nodes , 2005, PAM.

[13]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[14]  Magnus Karlsson,et al.  Choosing replica placement heuristics for wide-area systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[15]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[16]  M. P. Szymaniak,et al.  Latency-driven replication for globally distributed systems , 2007 .

[17]  Jeong-Hyon Hwang,et al.  A Retrospective Approach for Accurate Network Latency Prediction , 2010, 2010 Proceedings of 19th International Conference on Computer Communications and Networks.

[18]  Ishfaq Ahmad,et al.  Comparison and analysis of ten static heuristics-based Internet data replication techniques , 2008, J. Parallel Distributed Comput..