Towards Byzantine Fault Tolerance in Many-Core Computing Platforms

This paper presents a flexible technique that can be applied to many-core architectures to exploit idle resources and ensure reliable system operation. A dynamic fault tolerance layer is interposed between the hardware and OS through the use of a hypervisor. The introduction of a single point of failure is avoided by incorporating the hypervisor into the sphere of replication. This approach simplifies implementation over specialized hardware- or OS-based techniques while offering flexibility in the level of protection provided ranging from duplex to Byzantine protection. The feasibility of the approach is considered for both near- and long-term computing platforms.