A self-tuning, self-protecting, self-healing session state management layer

Management of semi-persistent state, such as usersession state, is one factor that complicates failure management in clustered three-tier Internet applications [5]. We observe that the specific properties of user-session state can be exploited to design a lightweight state storage layer that offers many of the same ease-of-management and ease-ofrecovery properties as stateless components such as Web servers. We describe SSM, a self-tuning, selfprotecting, and self-healing session state management layer that provides a storage and retrieval mechanism for semi-persistent, serial-access user session state. SSM is fast, scalable, fault-tolerant, and recovers instantly from individual node failures. Any SSM node may be rebooted at any time and there is no special recovery code, so the performance cost of “eager” recovery is near zero, simplifying recovery policy management when SSM is integrated into a larger

[1]  Margo I. Seltzer,et al.  Challenges in Embedded Database System Administration , 1999, USENIX Workshop on Embedded Systems.

[2]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[3]  Dean Jacobs,et al.  Distributed Computing with BEA WebLogic Server , 2003, CIDR.

[4]  David E. Culler,et al.  Overload management as a fundamental service design primitive , 2002, EW 10.

[5]  Kishor S. Trivedi,et al.  A methodology for detection and estimation of software aging , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[6]  George Candea,et al.  Crash-Only Software , 2003, HotOS.

[7]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[8]  V. Rich Personal communication , 1989, Nature.

[9]  Jim Gray,et al.  The Transaction Concept: Virtues and Limitations (Invited Paper) , 1981, VLDB.

[10]  V. Jacobson,et al.  Congestion avoidance and control , 1988, SIGCOMM '88.

[11]  William D. Clinger,et al.  Generational garbage collection and the radioactive decay model , 1997, PLDI '97.

[12]  David E. Culler,et al.  Distributed data structures for internet service construction , 2000, USENIX Symposium on Operating Systems Design and Implementation.

[13]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[14]  Thu D. Nguyen,et al.  Us-ing Fault Model Enforcement to Improve Availability , 2002 .

[15]  David E. Culler,et al.  Scalable, distributed data structures for internet service construction , 2000, OSDI.

[16]  George Candea,et al.  Reducing recovery time in a small recursively restartable system , 2002, Proceedings International Conference on Dependable Systems and Networks.

[17]  Archana Ganapathi,et al.  Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.