Multi-root I/O virtualization based redundant systems

Redundancy, a method being designed to prevent failures due to software/hardware problem, is one of the most common applications in fault-tolerance systems. In this paper, we provide a multi-root I/O virtualization (MR-IOV) based redundant system architecture which supports high performance, reliability, and scalability to improve conventional redundant architecture with hardware multiplexer for the fail-over function. In order to fix this drawback, we proposed a redundant architecture to save these statuses in the shared memory, and the backup system will apply the states to fail-over primary host. From experiment results, we observe that the proposed architecture is feasible and it is better than the conventional redundant architecture.

[1]  Daniel P. Siewiorek,et al.  C.vmp: The Analysis, Architecture and Implementation of a Fault Tolerant Multiprocessor. , 1976 .

[2]  Refik Samet Recovery Device for Real-Time Dual-Redundant Computer Systems , 2011, IEEE Transactions on Dependable and Secure Computing.

[3]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[4]  A.L. Hopkins,et al.  FTMP—A highly reliable fault-tolerant multiprocess for aircraft , 1978, Proceedings of the IEEE.

[5]  J. Goldberg,et al.  SIFT: Design and analysis of a fault-tolerant computer for aircraft control , 1978, Proceedings of the IEEE.

[6]  Bashir M. Al-Hashimi,et al.  Combined time and information redundancy for SEU-tolerance in energy-efficient real-time systems , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  Algirdas Avizienis,et al.  The STAR (Self-Testing And Repairing) Computer: An Investigation of the Theory and Practice of Fault-Tolerant Computer Design , 1971, IEEE Transactions on Computers.

[8]  Letha H. Etzkorn,et al.  A fault-tolerant approach to test control utilizing dual-redundant processors , 2008, Adv. Eng. Softw..

[9]  Fumio Machida,et al.  Redundant virtual machine placement for fault-tolerant consolidated server clusters , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[10]  Edward J. McCluskey,et al.  A Design Diversity Metric and Analysis of Redundant Systems , 2002, IEEE Trans. Computers.

[11]  Timothy K. Tsai,et al.  Fault tolerance via N-modular software redundancy , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[12]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[13]  Neeraj Suri,et al.  Using Underutilized CPU Resources to Enhance Its Reliability , 2010, IEEE Transactions on Dependable and Secure Computing.

[14]  Joel R. Sklaroff,et al.  Redundancy Management Technique for Space Shuttle Computers , 1976, IBM J. Res. Dev..

[15]  Dhiraj K. Pradhan,et al.  Fault-tolerant computer system design , 1996 .