Efficient middleware for byzantine fault tolerant database replication

Byzantine fault tolerance (BFT) enhances the reliability and availability of replicated systems subject to software bugs, malicious attacks, or other unexpected events. This paper presents Byzantium, a BFT database replication middleware that provides snapshot isolation semantics. It is the first BFT database system that allows for concurrent transaction execution without relying on a centralized component, which is essential for having both performance and robustness. Byzantium builds on an existing BFT library but extends it with a set of techniques for increasing concurrency in the execution of operations, for optimistically executing operations in a single replica, and for striping and load-balancing read operations across replicas. Experimental results show that our replication protocols introduce only a modest performance overhead for read-write dominated workloads and perform better than a non-replicated database system for read-only workloads.

[1]  Rodrigo Rodrigues,et al.  Byzantium: Byzantine-Fault-Tolerant Database Replication Providing Snapshot Isolation , 2008, HotDep.

[2]  Dennis Shasha,et al.  Making snapshot isolation serializable , 2005, TODS.

[3]  Sameh Elnikety,et al.  Tashkent+: memory-aware load balancing and update filtering in replicated databases , 2007, EuroSys '07.

[4]  Ricardo Jiménez-Peris,et al.  Middleware based data replication providing snapshot isolation , 2005, SIGMOD '05.

[5]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[6]  Lorenzo Strigini,et al.  Fault Tolerance via Diversity for Off-the-Shelf Products: A Study with SQL Database Servers , 2007, IEEE Transactions on Dependable and Secure Computing.

[7]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[8]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[9]  Gustavo Alonso,et al.  Ganymed: Scalable Replication for Transactional Web Applications , 2004, Middleware.

[10]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[11]  Cheng Li,et al.  A study of the internal and external effects of concurrency bugs , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[12]  Hector Garcia-Molina,et al.  Applications of Byzantine agreement in database systems , 1986, TODS.

[13]  George Candea,et al.  Middleware-based database replication: the gaps between theory and practice , 2007, SIGMOD Conference.

[14]  Fernando Pedone,et al.  Database replication using generalized snapshot isolation , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[15]  Miguel Castro,et al.  BASE: Using abstraction to improve fault tolerance , 2003, TOCS.

[16]  Ramakrishna Kotla,et al.  Zyzzyva: speculative byzantine fault tolerance , 2007, TOCS.

[17]  Miguel Castro,et al.  Using abstraction to improve fault tolerance , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[18]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[19]  Maurice Herlihy,et al.  Axioms for concurrent objects , 1987, POPL '87.

[20]  Steven D. Gribble,et al.  Robustness in complex systems , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[21]  Fernando Pedone,et al.  Tashkent: uniting durability with transaction ordering for high-performance scalable database replication , 2006, EuroSys.

[22]  Sangmin Lee,et al.  Upright cluster services , 2009, SOSP '09.

[23]  Hari Balakrishnan,et al.  Tolerating byzantine faults in transaction processing systems using commit barrier scheduling , 2007, SOSP.