Log-based recovery for middleware servers

We have developed new methods for log-based recovery for middleware servers which involve thread pooling, private in-memory states for clients, shared in-memory state and message interactions among middleware servers. Due to the observed rareness of crashes, relatively small size of shared state and infrequency of shared state read/write accesses, we are able to reduce the overhead of message logging and shared state logging while maintaining recovery independence. Checkpointing has a very small impact on ongoing activities while still reducing recovery time. Our recovery mechanism enables client private states to be recovered in parallel after a crash. On a commercial middleware server platform, we have implemented a recovery infrastructure prototype, which demonstrates the manageability of system complexity and shows promising performance results.

[1]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.

[2]  David B. Lomet Robust Web Services via Interaction Contracts , 2004, TES.

[3]  Vijay K. Garg,et al.  Optimistic recovery in multi-threaded distributed systems , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[4]  Roger S. Barga,et al.  Improving logging and recovery performance in Phoenix/App , 2004, Proceedings. 20th International Conference on Data Engineering.

[5]  Philip A. Bernstein,et al.  Implementing recoverable requests using queues , 1990, SIGMOD '90.

[6]  Priya Narasimhan,et al.  Enforcing determinism for the consistent replication of multithreaded CORBA applications , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[7]  Gerhard Weikum,et al.  Efficient transparent application recovery in client-server information systems , 1998, SIGMOD '98.

[8]  Koen De Bosschere,et al.  Record/replay for nondeterministic program executions , 2003, CACM.

[9]  Harrick M. Vin,et al.  A fault-tolerant java virtual machine , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[10]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[11]  Priya Narasimhan,et al.  State synchronization and recovery for strongly consistent replicated CORBA objects , 2001, 2001 International Conference on Dependable Systems and Networks.

[12]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[13]  Gerhard Weikum,et al.  Recovery guarantees for general multi-tier applications , 2002, Proceedings 18th International Conference on Data Engineering.

[14]  Vijay K. Garg,et al.  How to recover efficiently and asynchronously when optimism fails , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.