Adding Dynamicity to the Uncertainty that Characterizes Distributed Systems: Challenges Ahead

The uncertainties inherent to distributed systems, such as unpredictable message delays and process failures, gave place to a great research effort in the past, where numerous solutions to fault-tolerance mechanisms have been proposed with a variety of guarantees and underlying system assumptions. The advent of new classes of distributed system applications (such as social networks, security, smart objects sharing etc) and technologies (VANET, WiMax, Airborn Networks, DoD Global Information Grid, P2P) are radically changing the way in which distributed systems are perceived. Such emerging systems have a composition, in term processes participating to the system, that is self-defined at run time depending, for example, on their will to belong to such a system, on the geographical distribution of processes etc. In this paper, we point to some of the challenges that have to be addressed by fault-tolerance solutions in the light of such a new dynamicity dimension, and that will motivate our future collaborative work.

[1]  David Eisenstat,et al.  The computational power of population protocols , 2006, Distributed Computing.

[2]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[3]  Nancy A. Lynch,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[4]  Antonio Casimiro,et al.  The Timely Computing Base Model and Architecture , 2002, IEEE Trans. Computers.

[5]  Danny Dolev,et al.  On the minimal synchronism needed for distributed consensus , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[6]  Raimundo José de Araújo Macêdo,et al.  A consensus protocol based on a weak failure detector and a sliding round window , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[7]  Anne-Marie Kermarrec,et al.  Implementing a Register in a Dynamic Distributed System , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[8]  Raimundo José de Araújo Macêdo,et al.  A general framework to solve agreement problems , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[9]  Raimundo José de Araújo Macêdo,et al.  An Adaptive Programming Model for Fault-Tolerant Distributed Computing , 2007, IEEE Transactions on Dependable and Secure Computing.

[10]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[11]  Flaviu Cristian,et al.  The Timed Asynchronous Distributed System Model , 1999, IEEE Trans. Parallel Distributed Syst..

[12]  Raimundo José de Araújo Macêdo,et al.  Consensus Based on Strong Failure Detectors: A Time and Message-Efficient Protocol , 2000, IPDPS Workshops.

[13]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[14]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[15]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.