Quantifying Eventual Consistency For Aggregate Queries

With the advent of inexpensive cloud computing resources, scalable distributed data stores have surged in popularity [7, 10, 16, 17, 20]. Such systems focus on horizontal scalability and take advantage of cheap, pay by the hour, compute nodes provisioned through the cloud [6]. In doing so, these systems are able to distribute query and insert load across many “shared nothing” compute nodes, improving latency and throughput performance. Consequently, the use of multiple compute nodes increases the likelihood that a node may fail at a given time, making availability a critically important quality [10]. Key-value stores typically address this problem by maintaining � redundant replicas of its data set [10, 16]. In doing so, if a single node in the system fails, � − 1 nodes replicating the same data remain accessible. Increasing � increases the availability of a system. However, introducing redundant replication to a system introduces the problem of consistency. Since networks are unpredictable, each insert operation will arrive at the � different replicas at different times. This leads to the data

[1]  Heiko Schuldt,et al.  FAS - A Freshness-Sensitive Coordination Middleware for a Cluster of OLAP Components , 2002, VLDB.

[2]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[3]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[4]  Andrew Rau-Chaplin,et al.  Scalable real-time OLAP on cloud architectures , 2015, J. Parallel Distributed Comput..

[5]  Ion Stoica,et al.  Quantifying eventual consistency with PBS , 2014, CACM.

[6]  David Kenneth Gifford,et al.  Information storage in a decentralized computer system , 1981 .

[7]  Arbee L. P. Chen,et al.  Evaluating Aggregate Operations Over Imprecise Data , 1996, IEEE Trans. Knowl. Data Eng..

[8]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[9]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[10]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[11]  Ion Stoica,et al.  Probabilistically Bounded Staleness for Practical Partial Quorums , 2012, Proc. VLDB Endow..

[12]  Deep Ganguli,et al.  Druid: a real-time analytical data store , 2014, SIGMOD Conference.

[13]  Sally I. McClean,et al.  Aggregation of Imprecise and Uncertain Information in Databases , 2001, IEEE Trans. Knowl. Data Eng..

[14]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[15]  Neal Leavitt,et al.  Will NoSQL Databases Live Up to Their Promise? , 2010, Computer.

[16]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[17]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[18]  Ali Ghodsi,et al.  Eventual Consistency Today: Limitations, Extensions, and Beyond , 2013 .

[19]  M. Herlihy A quorum-consensus replication method for abstract data types , 1986, TOCS.