Wide area placement of data replicas for fast and highly available data access

Recent years have seen rapid growth of online data storage and computing services at various locations around the world. In wide area applications, data can be replicated at multiple locations to serve users with lower latency and higher availability. This paper presents an approach that achieves both fast and highly available data access through periodic migration of data replicas. Such migration strives to maximize a user-defined objective function that incorporates data access delay and availability into a single utility value. To efficiently estimate data access delay and availability for any feasible replica placement, this approach maintains a small data structure that summarizes recent accesses to data replicas. This paper demonstrates, based on an evaluation study, the effectiveness of the developed technique and concludes with plans for future research.

[1]  Robert Tappan Morris,et al.  Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM '04.

[2]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[3]  Michal Szymaniak,et al.  Latency-Driven Replica Placement , 2006 .

[4]  Roger Wattenhofer,et al.  Competitive Hill-Climbing Strategies for Replica Placement in a Distributed File System , 2001, DISC.

[5]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[6]  Robert Tappan Morris,et al.  Flexible, Wide-Area Storage for Distributed Systems with WheelFS , 2009, NSDI.

[7]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[8]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[9]  Lili Qiu,et al.  On the placement of Web server replicas , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[10]  Jeong-Hyon Hwang,et al.  A Retrospective Approach for Accurate Network Latency Prediction , 2010, 2010 Proceedings of 19th International Conference on Computer Communications and Networks.

[11]  Ishfaq Ahmad,et al.  Comparison and analysis of ten static heuristics-based Internet data replication techniques , 2008, J. Parallel Distributed Comput..

[12]  M. P. Szymaniak,et al.  Latency-driven replication for globally distributed systems , 2007 .

[13]  Matthew S. Allen,et al.  The Livny and Plank-Beck Problems: Studies in Data Movement on the Computational Grid , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[14]  Alec Wolman,et al.  Volley: Automated Data Placement for Geo-Distributed Cloud Services , 2010, NSDI.

[15]  Ying Ding,et al.  Algorithms for High Performance, Wide-Area Distributed File Downloads , 2003, Parallel Process. Lett..

[16]  Yannis Manolopoulos,et al.  A latency-based object placement approach in content distribution networks , 2005, Third Latin American Web Congress (LA-WEB'2005).

[17]  Y. Ebihara Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[18]  Florian Schintke,et al.  Modeling Replica Availability in Large Data Grids , 2005, Journal of Grid Computing.

[19]  Guillaume Pierre,et al.  Autonomic Data Placement Strategies for Update-intensiveWeb applications , 2005, First International Workshop on Advanced Architectures and Algorithms for Internet Delivery and Applications (AAA-IDEA'05).

[20]  Ralf Steinmetz,et al.  Quality of availability: replica placement for widely distributed systems , 2003, IWQoS'03.

[21]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[22]  Xiaoyan Hong,et al.  An on-line replication strategy to increase availability in Data Grids , 2008, Future Gener. Comput. Syst..

[23]  Magnus Karlsson,et al.  Choosing replica placement heuristics for wide-area systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..