Fault-tolerant communications processing

The concept of combining the traditional redundancy approach to fault tolerant design with the error detection and recovery mechanisms built into most of the existing communication protocols is addressed. The goal is to achieve low-cost fault-tolerant communication processing (transparent to the user) in the presence of individual processor board failures. General techniques for achieving system-level fault tolerance are reviewed. The notion of error control (recovery) used in computer communications is discussed and compared with the idea of fault tolerance and error recovery in computer science. A general multiprocessor model of a network processor is introduced, and a novel technique, called redundant task allocation, for achieving fault tolerance in a multiprocessor environment is described. Some of the issues in and approaches to recovery and tolerance of communication protocols after a failure of the underlying hardware are examined. A system prototype is described, and some simulation results are reported.<<ETX>>