Empirical Evaluation of Global Memory Support on the CRAY-T3D and CRAY-T3E

Abstract : Performance prediction on parallel machines is notoriously difficult, especially using designer-supplied machine parameters for features like clock speed, network latency, and network bandwidth. The performance observed by an application programmer is a complicated function of the local memory hierarchy on each node, software overheads from the compiler and operating system, and interactions between components of the machine. As a result, the process of understanding and tuning application level performance is often an ad hoc process, lacking in the kinds of models and tools that are enjoyed by other engineering disciplines. In a previous paper [1], we proposed a "gray-box approach" for measuring machine performance through the use of micro-benchmarks, and applied it to the problem of compiling a global address space language on the Cray T3D. In this paper, we use micro-benchmarks to compare the support for a global address space on two Cray machines, the T3D and T3E.