1.1 Introduction We are faced today with the confluence of antagonistic aims, when designing and deploying distributed systems. On one hand, our applications have to achieve timeliness goals, dictated both by QoS expectations with regard to on-line services (e.g. time-bounded transactions), and by technical issues of real-time nature involved in the deployment of certain services (e.g., multi-media rendering). On the other hand, the open and large-scale environments where applications and users execute and evolve exhibit uncertain timeliness or synchrony. Likewise, services, despite their sometimes critical nature (not only money-critical, but also privacy-or even safety-critical), are more often deployed on-line or through open networks. It is required that they be resilient to intrusions, despite the elusiveness of attacks they are subject to, and the pervasiveness and subtelty of vulnerabilities in the relevant systems. In other words, the environment in which these services have to operate exhibits uncertain behavior: we cannot predict all possible present and future attacks; we cannot diagnose all vulnerabilities. In the previous paragraph, we essentially talked about uncertainty, the grand challenge faced by distributed system researchers and designers. When talking about uncertainty, 'impossibility' and 'probability' are words that come to mind. Literature has relevant examples on being pessimistic and accepting uncertainty, showing impossibility results[1.1], or producing solutions that are uncertain, albeit quantifiably uncertain [1.2, 1.3]. Other works have methodically studied what can be done when the system is incrementally less uncertain[1.4]. Alternatively, other approaches are more optimistic, assuming that the system has periods of determinism, alternating with uncertainty, and try to identify and successfully explore those (sometimes scarce) periods, to perform useful tasks[1.5, 1.6]. Nevertheless, a designer does not make strong assumptions about syn-chrony, or security, or structure, just for the sake of it. They are made because
[1]
Miguel Correia,et al.
The Design of a COTSReal-Time Distributed Security Kernel
,
2002,
EDCC.
[2]
Marcos K. Aguilera,et al.
On the Impact of Fast Failure Detectors on Real-Time Fault-Tolerant Systems
,
2002,
DISC.
[3]
Rachid Guerraoui,et al.
A realistic look at failure detectors
,
2002,
Proceedings International Conference on Dependable Systems and Networks.
[4]
Antonio Casimiro,et al.
The Timely Computing Base Model and Architecture
,
2002,
IEEE Trans. Computers.
[5]
Nancy A. Lynch,et al.
Consensus in the presence of partial synchrony
,
1988,
JACM.
[6]
Miguel Correia,et al.
Efficient Byzantine-resilient reliable multicast on a hybrid failure model
,
2002,
21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..
[7]
Flaviu Cristian,et al.
Probabilistic clock synchronization
,
1989,
Distributed Computing.
[8]
Michael O. Rabin,et al.
Randomized byzantine generals
,
1983,
24th Annual Symposium on Foundations of Computer Science (sfcs 1983).
[9]
Flaviu Cristian,et al.
The Timed Asynchronous Distributed System Model
,
1999,
IEEE Trans. Parallel Distributed Syst..
[10]
Achour Mostéfaoui,et al.
Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors
,
2000,
IEEE Trans. Parallel Distributed Syst..
[11]
Antonio Casimiro,et al.
The timely computing base: Timely actions in the presence of uncertain timeliness
,
2000,
Proceeding International Conference on Dependable Systems and Networks. DSN 2000.
[12]
Nancy A. Lynch,et al.
Impossibility of distributed consensus with one faulty process
,
1985,
JACM.
[13]
Sam Toueg,et al.
Unreliable failure detectors for reliable distributed systems
,
1996,
JACM.