A self-organized, fault-tolerant and scalable replication scheme for cloud storage

Failures of any type are common in current datacenters, partly due to the higher scales of the data stored. As data scales up, its availability becomes more complex, while different availability levels per application or per data item may be required. In this paper, we propose a self-managed key-value store that dynamically allocates the resources of a data cloud to several applications in a cost-efficient and fair way. Our approach offers and dynamically maintains multiple differentiated availability guarantees to each different application despite failures. We employ a virtual economy, where each data partition (i.e. a key range in a consistent-hashing space) acts as an individual optimizer and chooses whether to migrate, replicate or remove itself based on net benefit maximization regarding the utility offered by the partition and its storage and maintenance cost. As proved by a game-theoretical model, no migrations or replications occur in the system at equilibrium, which is soon reached when the query load and the used storage are stable. Moreover, by means of extensive simulation experiments, we have proved that our approach dynamically finds the optimal resource allocation that balances the query processing overhead and satisfies the availability objectives in a cost-efficient way for different query rates and storage requirements. Finally, we have implemented a fully working prototype of our approach that clearly demonstrates its applicability in real settings.

[1]  Michael Stonebraker,et al.  An economic paradigm for query processing and data migration in Mariposa , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[2]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[3]  Karl Aberer,et al.  Cost-efficient and differentiated data availability guarantees in data clouds , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[4]  Marc Shapiro,et al.  Robust, distributed references and acyclic garbage collection , 1992, PODC '92.

[5]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[6]  Michael Dahlin,et al.  End-to-end WAN service availability , 2001, TNET.

[7]  Kenneth Salem,et al.  Lazy database replication with snapshot isolation , 2006, VLDB.

[8]  GhemawatSanjay,et al.  The Google file system , 2003 .

[9]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[10]  Jacob R. Lorch,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OSDI '02.

[11]  Chrysanthos Dellarocas,et al.  Goodwill Hunting: An Economically Efficient Online Feedback Mechanism for Environments with Variable Product Quality , 2002, AMEC.

[12]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[13]  Divyakant Agrawal,et al.  The Tree Quorum Protocol: An Efficient Approach for Managing Replicated Data , 1990, VLDB.

[14]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[15]  Witold Litwin,et al.  LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes , 2000, SIGMOD '00.

[16]  Mahadev Satyanarayanan,et al.  Coda: A Highly Available File System for a Distributed Workstation Environment , 1990, IEEE Trans. Computers.

[17]  Philip A. Bernstein,et al.  An algorithm for concurrency control and recovery in replicated distributed databases , 1984, TODS.

[18]  Erich Schikuta,et al.  Towards a cost model for distributed and replicated data stores , 2001, Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing.

[19]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[20]  Karl Aberer,et al.  Dynamic cost-efficient replication in data clouds , 2009, ACDC '09.

[21]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[22]  Wen-Syan Li,et al.  QoS-based data access and placement for federated systems , 2005, VLDB 2005.

[23]  Theoni Pitoura,et al.  Replication, Load Balancing and Efficient Range Query Processing in DHTs , 2006, EDBT.

[24]  Krishna P. Gummadi,et al.  The impact of DHT routing geometry on resilience and proximity , 2003, SIGCOMM '03.

[25]  Stephen Russell,et al.  Resource management in the Mungi single-address-space operating system , 1998 .

[26]  Marvin Theimer,et al.  Bayou: replicated database services for world-wide applications , 1996, EW 7.

[27]  John S. Heidemann,et al.  The Ficus Replicated File System , 1992, OPSR.

[28]  Ethan L. Miller,et al.  A fast algorithm for online placement and reorganization of replicated data , 2003, Proceedings International Parallel and Distributed Processing Symposium.