Processor failure recovery for a resource sharing algorithm
暂无分享,去创建一个
With the increase in popularity of distributed computer systems, the reliability of the system as a whole is becoming more important. A recently published combined resource sharing algorithm showed how the atomic operations required for resource management in a closely coupled multiprocessor system could be provided. The paper describes a recovery system that may be incorporated within the earlier algorithm to enable continued and correct operation of the system despite the failure of one or more component processors. A distributed simulation of the recovery mechanism is described and results from simulation runs are presented.
[1] Parag K. Lala,et al. Fault tolerant and fault testable hardware design , 1985 .
[2] Brian Randell. System structure for software fault tolerance , 1975 .
[3] Edsger W. Dijkstra,et al. Co-operating sequential processes , 1968 .