BFT: the time is now

Data centers strive to provide reliable access to the data and services that they host. This reliable access requires the hosted data and services hosted by the data center to be both consistent and available. Byzantine fault tolerance (BFT) replication offers the promise of services that are consistent and available despite arbitrary failures by a bounded number of servers and an unbounded number of clients.

[1]  Andrea C. Arpaci-Dusseau,et al.  IRON file systems , 2005, SOSP '05.

[2]  Ramakrishna Kotla,et al.  High throughput Byzantine fault tolerance , 2004, International Conference on Dependable Systems and Networks, 2004.

[3]  Joel S. Emer,et al.  The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.

[4]  Archana Ganapathi,et al.  Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.

[5]  Atul Singh,et al.  BFT Protocols Under Fire , 2008, NSDI.

[6]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[7]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.

[8]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[9]  Mark Bickford,et al.  Nysiad: Practical Protocol Transformation to Tolerate Byzantine Failures , 2008, NSDI.

[10]  GhemawatSanjay,et al.  The Google file system , 2003 .

[11]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[12]  Michael K. Reiter,et al.  Low-overhead byzantine fault-tolerant storage , 2007, SOSP.

[13]  John Lane,et al.  Byzantine replication under attack , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[14]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[15]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[16]  Lisa Spainhower,et al.  Commercial fault tolerance: a tale of two systems , 2004, IEEE Transactions on Dependable and Secure Computing.

[17]  Junfeng Yang,et al.  EXPLODE: a lightweight, general system for finding serious storage system errors , 2006, OSDI '06.

[18]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[19]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[20]  Junfeng Yang,et al.  Using model checking to find serious file system errors , 2004, TOCS.

[21]  Michael Dahlin,et al.  BAR fault tolerance for cooperative services , 2005, SOSP '05.

[22]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[23]  Michael Dahlin,et al.  Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults , 2009, NSDI.

[24]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.