Achal: Building Highly Reliable Networked Control Systems

In a highly reliable networked control system, active replication of critical system components is necessary for instantaneous recovery from crash or a network partition failure. However, due to Byzantine errors, the replicas can diverge and produce incorrect outputs. In this work, we target the specific problem of replica coordination in presence of environmentally-induced Byzantine errors, while addressing challenges and constraints specific to the CPS domain.

[1]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[2]  Håkan Sivencrona,et al.  Byzantine Fault Tolerance, from Theory to Reality , 2003, SAFECOMP.

[3]  Alysson Neves Bessani,et al.  State Machine Replication for the Masses with BFT-SMART , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[4]  Alan Burns,et al.  A survey of hard real-time scheduling for multiprocessor systems , 2011, CSUR.

[5]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[6]  Michael Glaß,et al.  Formal reliability analysis of switched Ethernet automotive networks under transient transmission errors , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[7]  Vivien Quéma,et al.  RBFT: Redundant Byzantine Fault Tolerance , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[8]  John Lane,et al.  Byzantine replication under attack , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[9]  A.L. Hopkins,et al.  FTMP—A highly reliable fault-tolerant multiprocess for aircraft , 1978, Proceedings of the IEEE.