Enhancing group communication with self-manageable behavior

Group communication protocols (GCPs) play an important role in the design of modern distributed systems. A typical GCP exchanges control messages to provide message delivery guarantees, and a key point in the configuration of such a protocol is to establish the right trade-off between message overhead and delivery latency. This trade-off becomes even a greater challenge in systems where computing resources and application requirements may change at runtime. In such scenarios, the configuration of a GCP must be continuously re-adjusted to attain certain performance goals, or to adapt to current resource availability. This paper addresses this challenge by proposing self-managing mechanisms based on feedback control theory to a GCP especially designed to be self-manageable; in the proposed protocol, message overhead and delivery latency can be adjusted at runtime to follow some new operating set-point. The evaluation performed under varied scenarios shows the effectiveness of our approach.

[1]  Raimundo José de Araújo Macêdo,et al.  A Generic Group Communication Approach for Hybrid Distributed Systems , 2009, DAIS.

[2]  Raimundo José de Araújo Macêdo,et al.  QoS self-configuring failure detectors for distributed systems , 2010, DAIS'10.

[3]  Katsuhiko Ogata,et al.  Discrete-time control systems , 1987 .

[4]  Luís E. T. Rodrigues,et al.  Run-Time Switching Between Total Order Algorithms , 2006, Euro-Par.

[5]  Raimundo José de Araújo Macêdo Fault-tolerant group communication protocols for asynchronous systems , 1994 .

[6]  Kenneth P. Birman,et al.  The process group approach to reliable distributed computing , 1992, CACM.

[7]  Alberto Montresor,et al.  Group Communication in Partitionable Systems: Specification and Algorithms , 2001, IEEE Trans. Software Eng..

[8]  Danny Dolev,et al.  On the minimal synchronism needed for distributed consensus , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[9]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[10]  Flaviu Cristian,et al.  Synchronous and asynchronous , 1996, CACM.

[11]  Andrew S. Tanenbaum,et al.  Group communication in the Amoeba distributed operating system , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[12]  Robbert van Renesse,et al.  Building adaptive systems using ensemble , 1998 .

[13]  Atul Prakash,et al.  Adaptive group communication services for groupware systems , 1998, Proceedings Second International Enterprise Distributed Object Computing (Cat. No.98EX244).

[14]  Bjarne E. Helvik,et al.  Jgroup/ARM: A Distributed Object Group Platform with Autonomous Replication Management for Dependable Computing , 2008 .

[15]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[16]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.

[17]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[18]  Flaviu Cristian,et al.  Synchronous and Asynchronous Group Communication. , 1996 .

[19]  Raimundo José de Araújo Macêdo,et al.  A non-intrusive component-based approach for deploying unanticipated self-management behaviour , 2009, 2009 ICSE Workshop on Software Engineering for Adaptive and Self-Managing Systems.

[20]  Katsuhiko Ogata,et al.  Modern Control Engineering , 1970 .

[21]  Sushanta Karmakar,et al.  Adaptive broadcast by distributed protocol switching , 2007, SAC '07.

[22]  Mark Bickford,et al.  Protocol switching: exploiting meta-properties , 2001, Proceedings 21st International Conference on Distributed Computing Systems Workshops.

[23]  Yixin Diao,et al.  Feedback Control of Computing Systems , 2004 .

[24]  André Schiper,et al.  Structural and algorithmic issues of dynamic protocol update , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[25]  Newtop: a fault-tolerant group communication protocol , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[26]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[27]  Danny Dolev,et al.  An adaptive totally ordered multicast protocol that tolerates partitions , 1998, PODC '98.

[28]  Raimundo José de Araújo Macêdo,et al.  A Self-Manageable Group Communication Protocol for Partially Synchronous Distributed Systems , 2011, 2011 5th Latin-American Symposium on Dependable Computing.

[29]  Sam Toueg,et al.  Fault-tolerant broadcasts and related problems , 1993 .

[30]  Julie A. McCann,et al.  A survey of autonomic computing—degrees, models, and applications , 2008, CSUR.

[31]  Bernadette Charron-Bost,et al.  On the impossibility of group membership , 1996, PODC '96.