Performance and dependability evaluation of scalable massively parallel computer systems with conjoint simulation

Computer systems are becoming more and more a part of our daily life; business and industry rely on their service, and the health of human beings depends on their correct functioning. Computer systems used for critical tasks have to be carefully designed and tested during the early design stage, the prototype phase, and their operational life. Methods and tools are required to support and facilitate this vital task. In this article, we tackle the issue of system-level performance and dependability analysis of fault-tolerant scalable computer systems. A modeling methodology called “Conjoint Simulation” is presented, which is based on the parti tioning of the system model and the combination of several modeling techniques. Object-oriented model construction and process-based simulation are applied for architecture and workload modeling, and timed Petri nets are the core modeling technique representing the failure scenarios and repair policies. Splitting the overall model and exploiting appropriate modeling techniques ease the development, maintenance, and extensibility of large-scale and complex simulation models. Furthermore, techniques are provided for hierarchical model construction, object-oriented workload modeling, and simulated error injection in order to perform combined performance and dependability analysis.

[1]  William A. Wulf,et al.  Object-oriented techniques in hardware design , 1994, Computer.

[2]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[3]  A. Hein,et al.  Conjoint simulation-a technique for the combined performance and dependability analysis of large-scale computer systems , 1996, Proceedings of IEEE International Computer Performance and Dependability Symposium.

[4]  Herb Schwetman Hybrid simulation models of computer systems , 1978, CACM.

[5]  H. Hessenauer,et al.  Architecture and realization of the modular expandable multiprocessor system MEMSY , 1994, Proceedings of the First International Conference on Massively Parallel Computing Systems (MPCS) The Challenges of General-Purpose and Special-Purpose Computing.

[6]  Domenico Talia,et al.  Message-routing systems for transputer-based multicomputers , 1993, IEEE Micro.

[7]  Kishor S. Trivedi,et al.  Reliability estimation of fault-tolerant systems: tools and techniques , 1990, Computer.

[8]  Klaus Buchenrieder,et al.  Codesign : Computer-aided software/hardware engineering , 1994 .

[9]  Jack Dongarra,et al.  Pvm 3 user's guide and reference manual , 1993 .

[10]  Ravishankar K. Iyer,et al.  DEPEND: A Simulation-Based Environment for System Level Dependability Analysis , 1997, IEEE Trans. Computers.

[11]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[12]  William A. Wulf,et al.  A framework for hardware/software codesign , 1993, Computer.

[13]  Axel Hein,et al.  Conjoint simulation: a modeling framework for combined performance and dependability analysis of computer systems , 1997 .

[14]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[15]  Ravishankar K. Iyer,et al.  A Measurement-Based Model for Workload Dependence of CPU Errors , 1986, IEEE Transactions on Computers.

[16]  Peter Radford,et al.  Petri Net Theory and the Modeling of Systems , 1982 .

[17]  S TrivediKishor,et al.  Reliability Estimation of Fault-Tolerant Systems , 1990 .

[18]  Ravishankar K. Iyer,et al.  Experimental analysis of computer system dependability , 1996 .

[19]  Jörn Altmann,et al.  An Approach for Hierarchical System Level Diagnosis of Massively Parallel Computers Combined with a Simulation-Based Method for Dependability Analysis , 1994, EDCC.

[20]  Jerzy W. Rozenblit,et al.  Computer Aided Software/Hardware Engineering , 1994 .

[21]  Barry W. Johnson,et al.  System-level modeling in the ADEPT environment of a distributed computer system for real-time applications , 1995, Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium.

[22]  Wolfgang Hohl,et al.  SIMULATION-BASED PERFORMABILITY EVALUATION OF FAULT-TOLERANT MULTIPROCESSORS , 1995 .

[23]  John F. Meyer,et al.  Performability: A Retrospective and Some Pointers to the Future , 1992, Perform. Evaluation.

[24]  Kishor S. Trivedi,et al.  Analysis of Typical Fault-Tolerant Architectures using HARP , 1987, IEEE Transactions on Reliability.

[25]  William A. Wulf,et al.  Capturing design rationale in concurrent engineering teams , 1993 .

[26]  Kishor S. Trivedi,et al.  Reliability Modeling Using SHARPE , 1987, IEEE Transactions on Reliability.

[27]  Kumar K. Goswami,et al.  Design for dependability: a simulation-based approach , 1993 .

[28]  Herb Schwetman,et al.  CSIM: a C-based process-oriented simulation language , 1986, WSC '86.