Ordering Transactions with Prediction in Distributed Object Stores

In cloud-scale datacenters, it is common to shard (partition) data across large numbers of nodes. Atomic transactions are typically implemented by running transactions speculatively, and then certifying them, aborting ones that cause conflicts. However, in high contention scenarios, this approach has drawbacks: rather than achieving any substantial level of concurrency, it prevents concurrency by aborting all but one of the contending transactions. Our work explores a new option. We employ prediction, ordering transactions in advance based on the objects they are likely to access, providing ACID transactions in a Resilient Archive with Independent Nodes (ACID-RAIN). This preliminary ordering decreases abort rate, and eliminates aborts in error-free executions. To allow fast recovery from failures our scheme does not introduce any locks. The system consistency and durability rely on a single scalable tier of highly-available independent logs. Simulations using the Transactional-YCSB workloads show the scalability and benefits of ACID-

[1]  Leslie Lamport,et al.  Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. , 1984, TOPL.

[2]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[3]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[4]  Fernando Pedone,et al.  Sprint: a middleware for high-performance transaction processing , 2007, EuroSys '07.

[5]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[6]  Sinfonia: A new paradigm for building scalable distributed systems , 2009, TOCS.

[7]  Daniel J. Abadi,et al.  The case for determinism in database systems , 2010, Proc. VLDB Endow..

[8]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[9]  Divyakant Agrawal,et al.  Albatross: Lightweight Elasticity in Shared Storage Databases for the Cloud using Live Data Migration , 2011, Proc. VLDB Endow..

[10]  Stanley B. Zdonik,et al.  On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems , 2011, Proc. VLDB Endow..

[11]  Philip A. Bernstein,et al.  Hyder - A Transactional Record Manager for Shared Flash , 2011, CIDR.

[12]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[13]  Divyakant Agrawal,et al.  Zephyr: live migration in shared nothing databases for elastic cloud platforms , 2011, SIGMOD '11.

[14]  Dahlia Malkhi,et al.  From paxos to CORFU: a flash-speed shared log , 2012, OPSR.

[15]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[16]  Fernando Pedone,et al.  Scalable deferred update replication , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[17]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[18]  Divyakant Agrawal,et al.  Serializability, not Serial: Concurrency Control and Availability in Multi-Datacenter Datastores , 2012, Proc. VLDB Endow..

[19]  Yang Wang,et al.  All about Eve: Execute-Verify Replication for Multi-Core Servers , 2012, OSDI.

[20]  Sébastien Monnet,et al.  Gargamel: Boosting DBMS Performance by Parallelising Write Transactions , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[21]  Tim Kraska,et al.  MDCC: multi-data center consistency , 2012, EuroSys '13.