论文信息 - Latency hiding in parallel systems: a quantitative approach

Latency hiding in parallel systems: a quantitative approach

In many parallel applications, network latency causes a dramatic loss in processor utilization. This paper examines software pipelining as a technique for network latency hiding. It quantifies the potential improvements with detailed,instruction-level simulations. The benchmarks used are the Livermore Loop kernels and BLAS Level 1. These were parallelized and run on the instruction-level RISC simulator DLX, extended with both a blocking and a pipelined network. Our results show that prefetch in a pipelined network improves performance by a factor of 2 to 9, provided the network has sufficient bandwidth to accept at least 10 requests per processor.

[1] Leslie G. Valiant,et al. General Purpose Parallel Architectures , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[2] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[3] Michael Philippsen,et al. Compiling machine-independent parallel programs , 1993, SIGP.

[4] Alexander V. Veidenbaum,et al. Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990 .

[5] John Feo,et al. An analysis of the computational and parallel complexity of the Livermore Loops , 1988, Parallel Comput..

[6] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.