SPM-aware scheduling for nested loops in CMP systems

Chip multiprocessors (CMP) computing systems are usually employed to facilitate many specific applications including medical image processing, computer vision, and aerospace. In these computation-intensive applications, nested loops take the most significant section of computation cost and greatly affect system performance in terms of latency due to the frequent memory accesses. In order to enhance the parallelism of a nested loop, a critical work is to strategically map the iterative loops to processors so that we can exploit good parallelization of these loops and reduce the execution latency of the whole application. One of the most widely used method to do the iteration-to-processor mapping is pipelining, which enables each processor to perform the operations for the iteration mapped to it.

[1]  Daniel A. Brokenshire,et al.  Introduction to the Cell Broadband Engine Architecture , 2007, IBM J. Res. Dev..

[2]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.