PROAR: A Weak Consistency Model for Ceph

The primary-copy consistency model used in Ceph cannot satisfy the low latency requirement of write operation required by users. In this paper, we propose a weak consistency model, PROAR, based on a distributed hash ring mechanism to allow clients to only commit data to the primary node and synchronize data to replication nodes asynchronously in Ceph. Based on the distributed hash ring mechanism, the low latency requirement of write operation can be met. In addition, the workload of the primary node can be reduced while that of replication nodes can be more balanced. We have evaluated the proposed scheme on a Ceph storage system with 3 storage nodes. The experimental results show that PROAR can reduce about 50% write overhead compared to that of Ceph and has a more balanced workload around all the replication nodes.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[3]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[4]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[5]  Sérgio Duarte,et al.  Putting consistency back into eventual consistency , 2015, EuroSys.

[6]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[7]  Yasushi Saito,et al.  Optimistic replication , 2005, CSUR.

[8]  Cheng Li,et al.  Making geo-replicated systems fast as possible, consistent when necessary , 2012, OSDI 2012.

[9]  Carlos Maltzahn,et al.  RADOS: a scalable, reliable storage service for petabyte-scale storage clusters , 2007, PDSW '07.

[10]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[11]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[12]  Gil Neiger,et al.  Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[13]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[14]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[15]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[16]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[17]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[18]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[19]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[20]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.