Efficient Fault Tolerance using Intel MPX and TSX

Hardware faults can cause data corruptions during computation, and they are especially harmful if these corruptions happen in data pointers. Existing solutions, however, incur high performance overheads, which is unacceptable for computeintensive applications. In this work, we present an efficient faulttolerance approach against hardware faults by exploiting the new extensions to the x86 architecture. In particular, we propose that Intel MPX can be effectively used to detect faults in data pointers, while Intel TSX can provide roll-back recovery against these corruptions. Our preliminary evaluation supports this hypothesis, and we estimate the average overhead to be roughly around 50%.

[1]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[2]  Christian W. Otterstad A brief evaluation of Intel®MPX , 2015, 2015 Annual IEEE Systems Conference (SysCon) Proceedings.

[3]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[4]  Michael A. Bender,et al.  Fault tolerant data structures , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[5]  Christof Fetzer,et al.  HAFT: hardware-assisted fault tolerance , 2016, EuroSys.

[6]  Christof Fetzer,et al.  ELZAR: Triple Modular Redundancy Using Intel AVX (Practical Experience Report) , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[7]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[8]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multi-threading alternatives , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.