Proxy-based recovery for applications on wireless hand-held devices

The low communication bandwidth, slow processor and limited memory of hand-held devices make it undesirable for them to store their own checkpoints or send process state information over a wireless network. The paper describes an approach to failure recovery for three-tier client and server application environments where the client applications execute on wireless handheld devices. The key idea is to have the middle-tier proxy transparently monitor the client's interaction with the back-end server and continuously maintain a copy of the client's state based on messages exchanged between the client and the server. The proxy also sustains the client's connection to the back-end server when a client unexpectedly disconnects. The client does not participate in checkpointing nor message logging, thereby saving power, processor cycles and bandwidth. The proxy is scalable and enhances backend server performance. Experimental results are provided for recovery time and runtime overhead.

[1]  Jin Zhang,et al.  Active Cache: caching dynamic contents on the Web , 1999, Distributed Syst. Eng..

[2]  David B. Johnson,et al.  Sender-Based Message Logging , 1987 .

[3]  Mukesh Singhal,et al.  Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[4]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[5]  Gerhard Weikum,et al.  Efficient transparent application recovery in client-server information systems , 1998, SIGMOD '98.

[6]  Tao Yang,et al.  Cooperative caching of dynamic content on a distributed Web server , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[7]  Peter B. Danzig,et al.  A Hierarchical Internet Object Cache , 1996, USENIX ATC.

[8]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[9]  Michelle Butler,et al.  A Scalable HTTP Server: The NCSA Prototype , 1994, Comput. Networks ISDN Syst..

[10]  Anja Feldmann,et al.  Web proxy caching: the devil is in the details , 1998, PERV.

[11]  D. Manivannan,et al.  A low-overhead recovery technique using quasi-synchronous checkpointing , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[12]  Dhiraj K. Pradhan,et al.  Recoverable mobile environment: design and trade-off analysis , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[13]  Mukesh Singhal,et al.  Low-cost checkpointing with mutable checkpoints in mobile computing systems , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[14]  Arun Iyengar,et al.  Improving Web Server Performance by Caching Dynamic Data , 1997, USENIX Symposium on Internet Technologies and Systems.

[15]  Willy Zwaenepoel,et al.  The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[16]  Yi-Min Wang,et al.  Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[17]  W. Kent Fuchs,et al.  Message logging in mobile computing , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[18]  Yi-Min Wang,et al.  ONE-IP: Techniques for Hosting a Service on a Cluster of Machines , 1997, Comput. Networks.

[19]  David E. Culler,et al.  Using smart clients to build scalable services , 1997 .

[20]  Lorenzo Alvisi,et al.  Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[21]  Thomas P. Brisco DNS Support for Load Balancing , 1995, RFC.

[22]  D. Manivannan,et al.  Quasi-Synchronous Checkpointing: Models, Characterization, and Classification , 1999, IEEE Trans. Parallel Distributed Syst..

[23]  Mukesh Singhal,et al.  On the impossibility of min-process non-blocking checkpointing and an efficient checkpointing algorithm for mobile computing systems , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[24]  Makoto Takizawa,et al.  Checkpoint-recovery protocol for reliable mobile systems , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[25]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.

[26]  Erich M. Nahum,et al.  Locality-aware request distribution in cluster-based network servers , 1998, ASPLOS VIII.

[27]  Nuno Neves,et al.  Adaptive recovery for mobile environments , 1996, Proceedings. IEEE High-Assurance Systems Engineering Workshop (Cat. No.96TB100076).

[28]  Peter Scheuermann,et al.  Web++: A System for Fast and Reliable Web Service , 1999, USENIX Annual Technical Conference, General Track.

[29]  Luís Moura Silva,et al.  Global checkpointing for distributed programs , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[30]  Amin Vahdat,et al.  Transparent Result Caching , 1997, USENIX Annual Technical Conference.

[31]  B. R. Badrinath,et al.  Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.