Performance engineering of replica voting protocols for high assurance data collection systems

Real-time data collection in a distributed embedded system requires dealing with failures such as data corruptions by malicious devices and arbitrary message delays in the network. Replication of data collection devices is employed to deal with such failures, with voting among the replica devices to move a correct data to the end-user. Here, the data being voted upon can be large-sized and/or take long time to be compiled (such as images in a terrain surveillance system and transaction histories in an intrusion detection system). The goal of our paper is to engineer the voting protocols to achieve good performance while meeting the reliability requirements of data delivery in a high assurance setting. The performance metrics are the data transfer efficiency (DTE) and the time-to-complete a data delivery (TTC). DTE captures the network bandwidth wasted and/or the energy drain in wireless-connected devices; whereas, TTC depicts the degradation in user-level QoS due to delayed and/or missed data deliveries. So, improving both DTE and TTC is a goal of our performance engineering exercise. Our protocol-level optimizations focus on reducing: i) the movement of user-level data between voters, ii) the number of voting actions/messages generated, and iii) the latency caused by the voting itself. The paper describes these optimizations, along with the experimental results from a prototype voting system.

[1]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[2]  David H. Ackley,et al.  Building diverse computer systems , 1997, Proceedings. The Sixth Workshop on Hot Topics in Operating Systems (Cat. No.97TB100133).

[3]  Yunghsiang Sam Han,et al.  A witness-based approach for data fusion assurance in wireless sensor networks , 2003, GLOBECOM '03. IEEE Global Telecommunications Conference (IEEE Cat. No.03CH37489).

[4]  Nancy A. Lynch,et al.  Revisiting the PAXOS algorithm , 1997, Theor. Comput. Sci..

[5]  Shambhu Upadhyaya,et al.  Secure and fault-tolerant voting in distributed systems , 2001, 2001 IEEE Aerospace Conference Proceedings (Cat. No.01TH8542).

[6]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[7]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.