Architectural Support for Fine-Grained Parallelism on Multi-core Architectures 217 Architectural Support for Fine-Grained Parallelism on Multi-core Architectures
暂无分享,去创建一个
In order to harness the additional compute resources of future Multi-core Architectures (MCAs) with many cores, applications must expose their thread-level parallelism to the hardware. One common approach to doing this is to decompose a program into parallel “tasks” and allow an underlying software layer to schedule these tasks on different threads. Software task scheduling can provide good parallel performance as long as tasks are large compared to the software overhead. We examine a set of Recognition, Mining, and Synthesis (RMS) applications and find that a significant number have small tasks for which software task schedulers achieve only limited parallel speedups. We propose a hardware technique to accelerate dynamic task scheduling on MCAs with many cores. We compare this hardware to highly tuned software task schedulers for a set of RMS benchmarks with small tasks. The proposed hardware delivers significant performance improvements over the best software scheduler: for 64 cores, it is 88% faster on a set of loop-parallel benchmarks and 98% faster on a set of task-parallel benchmarks.
[1] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[2] P. K. Dubey,et al. Recognition, Mining and Synthesis Moves Comp uters to the Era of Tera , 2005 .