Loki: a state-driven fault injector for distributed systems

Distributed applications can fail in subtle ways that depend on the state of multiple parts of a system. This complicates the validation of such systems via fault injection, since it suggests that faults should be injected based on the global state of the system. In Loki, fault injection is performed based on a partial view of the global state of a distributed system, i.e. faults injected in one node of the system can depend on the state of other nodes. Once faults are injected, a post-runtime analysis, using off-line clock synchronization, is used to place events and injections on a single global timeline and to determine whether the intended faults were properly injected. Finally, experiments containing successful fault injections are used to estimate the specified measures. In addition to briefly reviewing the concepts behind Loki and its organization, we detail Loki's user interface. In particular, we describe the graphical user interfaces for specifying state machines and faults, for executing a campaign and for verifying whether the faults were properly injected.

[1]  Devesh Bhatt,et al.  SPI: an instrumentation development environment for parallel/distributed systems , 1995, Proceedings of 9th International Parallel Processing Symposium.

[2]  Kang G. Shin,et al.  DOCTOR: an integrated software fault injection environment for distributed real-time systems , 1995, Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium.

[3]  William H. Sanders,et al.  Fault injection based on a partial view of the global state of a distributed system , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[4]  Farnam Jahanian,et al.  Testing of fault-tolerant and real-time distributed systems via protocol fault injection , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[5]  Flaviu Cristian,et al.  Centralized failure injection for distributed, fault-tolerant protocol testing , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.