A distributed state monitoring service for adaptive application management

Anubis is a simple state monitoring service that supports coordinated action among distributed management agents. It uses a temporal consistency model that addresses symmetric and asymmetric network partitions. We have used Anubis to support distributed management of adaptive applications in grid and utility computing environments and our experience has shown that the abstraction and properties provided by the service simplify the task of programming distributed management behavior. We support this claim by examining three common use cases that our developers encountered, namely: resource management, lifecycle coordination, and compositional failure management.

[1]  Paul Murray,et al.  The Anubis Service , 2005 .

[2]  Flaviu Cristian,et al.  The Timed Asynchronous Distributed System Model , 1998, IEEE Trans. Parallel Distributed Syst..

[3]  Louise E. Moser,et al.  The Totem single-ring ordering and membership protocol , 1995, TOCS.

[4]  Flaviu Cristian,et al.  Fail-awareness in timed asynchronous systems , 1996, PODC '96.

[5]  Patrick Goldsack,et al.  SmartFrog Meets LCFG: Autonomous Reconfiguration with Central Policy Control , 2003, LISA.

[6]  F. Cristian,et al.  A fail-aware membership service , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[7]  Dale Skeen,et al.  The Information Bus: an architecture for extensible distributed systems , 1994, SOSP '93.

[8]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[9]  Jean Mayo,et al.  Global predicates in rough real time , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[10]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[11]  Ian T. Foster,et al.  Grid Services for Distributed System Integration , 2002, Computer.

[12]  Flaviu Cristian,et al.  The Timewheel Group Communication System , 2002, IEEE Trans. Computers.