An Environment for Testing Safety-Critical Distributed Protocols

This paper describes an environment for fault injection based testing of protocols that implement fault tolerance and redundancy management in safety-critical distributed real-time systems. Building confidence in the correctness of distributed protocols is an intrinsically difficult problem that requires the use of complementary testing and verification techniques. To this end, we propose a verification approach that involves three steps: i) initial testing in a software simulator, ii) formal verification by model checking and iii) final testing in a hardware prototype. Here, we describe an integrated test environment intended for the first and third step. It allows a tester to expose a protocol to various failure scenarios in both a software simulator and a hardware prototype system. Common data formats for definition of failure scenarios and for storing the protocols’ responses makes it possible run identical tests in the simulator and the hardware prototype and simplifies comparison of test results.