Integrated Evaluation of Parallel Systems

Abstract : Parallel (multiple-processor) computer systems are used to meet requirements for high performance. Multiple processors can also be used to achieve dependability through fault tolerance; however, the mere presence of more than one processor does not guarantee dependability. Where there are requirements for both high performance and dependability, the prudent designer of dependable parallel systems must judiciously balance both requirements. The Computer Science Laboratory of The Aerospace Corporation has developed a sophisticated approach, based on simulation, that is more flexible, accurate, and cost effective than other approaches for investigating how dependability and performance interact. We define the nature of the analysis problem, and we discuss our approach to measuring performance and evaluating dependability in a single environment through the use of two of our integrated tools, HERMES and Guage.