Distributed and fault-tolerant execution framework for transaction processing

There is a growing need for efficient distributed computing for transaction processing. One of the key requirements for runtime systems in distributed environments is fault tolerance. Such a system needs to preserve the data consistency at transaction boundaries so as to resume the ongoing tasks from checkpoints with consistent data for any component failure. Another key requirement is that the system needs to be lightweight enough in normal execution to provide scalable performance. This paper presents the design and implementation of a new fault tolerant execution framework that addresses both of these requirements. We replicate each partition of the distributed persistent data on three nodes (triplet) with two different types of backups, one using warm replication and the other using cold replication. For node failures, the system is automatically recoverable unless all three nodes in any triplet fail at the same time. The system tolerates simultaneous two-node failures in any triplet most of the cases. We obtained a new trade-off in that 43% performance improvements can be achieved by slightly compromising the system availability.

[1]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[2]  Manish Gupta,et al.  Systems research challenges: A scale-out perspective , 2006, IBM J. Res. Dev..

[3]  Fernando Pedone,et al.  Sprint: a middleware for high-performance transaction processing , 2007, EuroSys '07.

[4]  David Powell,et al.  Fault-tolerance in Delta-4 , 1991, OPSR.

[5]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[6]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Murray Cole,et al.  Algorithmic skeletons : a structured approach to the management of parallel computation , 1988 .

[9]  GhemawatSanjay,et al.  The Google file system , 2003 .

[10]  Esther Pacitti,et al.  Fast Algorithms for Maintaining Replica Consistency in Lazy Master Replicated Databases , 1999, VLDB.

[11]  Alan L. Cox,et al.  Distributed Versioning: Consistent Replication for Scaling Back-End Databases of Dynamic Content Web Sites , 2003, Middleware.

[12]  Toshio Nakatani,et al.  Parallel programming framework for large batch transaction processing on scale-out systems , 2010, SYSTOR '10.

[13]  Gustavo Alonso,et al.  Ganymed: Scalable Replication for Transactional Web Applications , 2004, Middleware.

[14]  Herbert Kuchen,et al.  A Skeleton Library , 2002, Euro-Par.

[15]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[16]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[17]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[18]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[19]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[20]  Gustavo Alonso,et al.  Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[21]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[22]  Fred B. Schneider,et al.  Byzantine generals in action: implementing fail-stop processors , 1984, TOCS.

[23]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.