Towards quality-of-service driven consistency for Big Data management

With the advent of Cloud Computing, Big Data management has become a fundamental challenge during the deployment and operation of distributed highly available and fault-tolerant storage systems such as the HBase extensible record-store. These systems can provide support for geo-replication, which comes with the issue of data consistency among distributed sites. In order to offer a best-in-class service to applications, one wants to maximise performance while minimising latency. In terms of data replication, that means incurring in as low latency as possible when moving data between distant data centres. Traditional consistency models introduce a significant problem for systems architects, which is specially important to note in cases where large amounts of data need to be replicated across wide-area networks. In such scenarios it might be suitable to use eventual consistency, and even though not always convenient, latency can be partly reduced and traded for consistency guarantees so that data-transfers do not impact performance. In contrast, this work proposes a broader range of data semantics for consistency while prioritising data at the cost of putting a minimum latency overhead on the rest of non-critical updates. Finally, we show how these semantics can help in finding an optimal data replication strategy for achieving just the required level of data consistency under low latency and a more efficient network bandwidth utilisation.

[1]  Kaixuan Wang,et al.  A Priority Queue Algorithm for the Replication Task in HBase , 2013, J. Softw..

[2]  D. Carstoiu,et al.  Hadoop Hbase-0.20.2 performance evaluation , 2010, 4th International Conference on New Trends in Information Science and Service Science.

[3]  Andrea C. Arpaci-Dusseau,et al.  Analysis of HDFS under HBase: a facebook messages case study , 2014, FAST.

[4]  Daniel J. Abadi,et al.  Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story , 2012, Computer.

[5]  Luís Veiga,et al.  Unifying divergence bounding and locality awareness in replicated systems with vector-field consistency , 2010, Journal of Internet Services and Applications.

[6]  Amin Vahdat,et al.  Design and evaluation of a continuous consistency model for replicated services , 2000, OSDI.

[7]  Randy H. Katz,et al.  Cake: enabling high-level SLOs on shared storage systems , 2012, SoCC '12.

[8]  Ramesh K. Sitaraman,et al.  The Akamai network: a platform for high-performance internet applications , 2010, OPSR.

[9]  Brian F. Cooper Spanner: Google's globally-distributed database , 2013, SYSTOR '13.

[10]  Cheng Li,et al.  Making geo-replicated systems fast as possible, consistent when necessary , 2012, OSDI 2012.

[11]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[12]  Robert Leroy Kruse,et al.  Data structures and program design , 1984 .

[13]  Marc Shapiro,et al.  Conflict-Free Replicated Data Types , 2011, SSS.

[14]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[15]  Luís Veiga,et al.  Quality-of-Service for Consistency of Data Geo-replication in Cloud Computing , 2012, Euro-Par.

[16]  Amin Vahdat,et al.  Combining generality and practicality in a conit-based continuous consistency model for wide-area replication , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[17]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[18]  Lori A. Clarke,et al.  Consistency management for complex applications , 1998, Proceedings of the 20th International Conference on Software Engineering.

[19]  Sebastian Burckhardt,et al.  Eventually Consistent Transactions , 2012, ESOP.

[20]  Divyakant Agrawal,et al.  G-Store: a scalable data store for transactional multi key access in the cloud , 2010, SoCC '10.

[21]  Luís Veiga,et al.  Quality-of-data for consistency levels in geo-replicated cloud data stores , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[22]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[23]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[24]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[25]  Gustavo Alonso,et al.  Consistency Rationing in the Cloud: Pay only when it matters , 2009, Proc. VLDB Endow..

[26]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[27]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[28]  Mikhail Bautin,et al.  Storage Infrastructure Behind Facebook Messages: Using HBase at Scale , 2012, IEEE Data Eng. Bull..

[29]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[30]  Kevin Lee,et al.  Data Consistency Properties and the Trade-offs in Commercial Cloud Storage: the Consumers' Perspective , 2011, CIDR.

[31]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[32]  Joseph M. Hellerstein,et al.  Consistency without borders , 2013, SoCC.