The complexity of parallel computer systems makes a priori performance prediction difficult and experimental performance analysis crucial. A complete characterization of software and hardware dynamics, needed to understand the performance of high-performance parallel systems, requires execution time performance instrumentation. Although software recording of performance data suffices for low frequency events, capture of detailed, high-frequency performance data ultimately requires hardware support if the performance instrumentation is to remain efficient and unobtrusive.
This paper describes the design of HYPERMON, a hardware system to capture and record software performance traces generated on the Intel iPSC/2 hypercube. HYPERMON represents a compromise between fully-passive hardware monitoring and software event tracing; software generated events are extracted from each node, timestamped, and externally recorded by HYPERMON. Using an instrumented version of the iPSC/2 operating system and several application programs, we present a performance analysis of an operational HYPERMON prototype and assess the limitations of the current design. Based on these results, we suggest design modifications that should permit capture of event traces from the coming generation of high-performance distributed memory parallel systems.
[1]
Prithviraj Banerjee,et al.
A parallel simulated annealing algorithm for channel routing on a hypercube multiprocessor
,
1988,
Proceedings 1988 IEEE International Conference on Computer Design: VLSI.
[2]
Leslie Lamport,et al.
Time, clocks, and the ordering of events in a distributed system
,
1978,
CACM.
[3]
R. Arlauskas,et al.
iPSC/2 system: a second generation hypercube
,
1988,
C3P.
[4]
Daniel A. Reed,et al.
Experiences with Hypercube Operating System Instrumentation
,
1989,
Int. J. High Speed Comput..
[5]
C. B. Stunkel,et al.
Hypercube implementation of the simplex algorithm
,
1989,
C3P.
[6]
P. Close.
The iPSC/2 node architecture
,
1988,
C3P.
[7]
Allen D. Malony,et al.
An integrated performance data collection, analysis, and visualization system
,
1989
.
[8]
David C. Rudolph.
Performance instrumentation for the Intel IPSC /2
,
1989
.
[9]
Allen D. Malony,et al.
Performance Measurement Intrusion and Perturbation Analysis
,
1992,
IEEE Trans. Parallel Distributed Syst..