A hierarchical heterogeneous solution to high performance cost-efficient computing
暂无分享,去创建一个
This thesis outlines a cost-effective multiprocessor architecture that takes into consideration the importance of system costs as well as delivered performance. The proposed architecture, HPAM, is organized as a Hierarchy of Processor-And-Memory homogeneous subsystems. Across the levels of the hierarchy, processor speeds and interconnection technology vary. The proposed multilevel processor configuration uses fast and costly resources sparingly to reduce sequential and low parallelism bottlenecks. The resulting organization tries to balance cost, speed and parallelism granularity.
Two temporal (instruction and data) locality principles with respect to the degree of parallelism are identified and empirically established for a set of programs. These principles suggest the desirability of a hierarchical approach to cost-effective high-performance computing.
In order to conduct detailed analysis of the different features of HPAM machines, a simulator was developed. This simulator (HPAM$\sb-$Sim) allows the simulation of target machines consisting of different processors and interconnection networks in either contention or non-contention modes.
Using HPAM$\sb-$Sim, a simulation-based study of the performance achieved by mapping compiler and hand-parallelized versions of the CMU benchmarks onto different HPAM machines was conducted. This study establishes that (1) HPAM machines can have higher cost-efficiency than the optimal homogeneous machine for a given application; (2) HPAM machines can benefit from hardware and software support for reconfiguring three-level and two-level machines into two-level and one-level organizations; (3) the performance of a given application, when executed on an HPAM machine, is dictated not only by the degree of parallelism but also the ratio of total communication time to total computation time of the application; and (4) efficient implementations of collective communication operations can improve the performance of HPAM machines. This last fact led to the study of three important collective communication operations, namely broadcast, scatter and gather, in the context of an HPAM machine.