Memory bank aware dynamic loop scheduling

In a parallel system with multiple CPUs, one of the key problems is to assign loop iterations to processors. This problem, known as the loop scheduling problem, has been studied in the past, and several schemes, both static and dynamic, have been proposed. One of the attractive features of dynamic schemes, as compared to their static counterparts, is their ability of exploiting the latency variations across the execution times of the different loop iterations. In all the dynamic loop scheduling techniques proposed in literature so far, performance has been the primary metric of interest. In a battery-operated embedded execution environment, however, power consumption is another metric to consider during iteration-to-processor assignment. In particular, in a banked memory system, this assignment can have an important impact on memory power consumption, which can be a significant portion of the overall energy consumption, especially for data-intensive embedded applications such as those from the domain of image data processing. This paper presents a bank aware dynamic loop scheduling scheme for array-intensive embedded media applications. The goal behind this new scheduling scheme is to minimize the number of memory banks that need to be used for executing the current working set (group of loop iterations) when all processors are considered together. That is, during the loop iteration-to-processor assignment, our approach considers the bank access patterns of loop iterations and carefully selects the set of iterations to assign to an idle processor so that, if possible, the number of memory banks that are used at the current state is not increased. Our experimental results show that the proposed scheduling scheme leads to much better energy results when compared to prior loop scheduling techniques and it is also competitive with the scheduler that generates the best performance. To our knowledge, this is the first dynamic loop scheduling scheme that is memory bank aware.

[1]  Steven Lucco,et al.  A dynamic scheduling method for irregular parallel programs , 1992, PLDI '92.

[2]  Mahmut T. Kandemir,et al.  Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems , 2002, CC.

[3]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[4]  Evangelos P. Markatos,et al.  Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Supercomputing '92.

[5]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[6]  Constantine D. Polychronopoulos,et al.  Parallel programming and compilers , 1988 .

[7]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[8]  Mahmut T. Kandemir LODS: locality-oriented dynamic scheduling for on-chip multiprocessors , 2004, Proceedings. 41st Design Automation Conference, 2004..

[9]  Evangelos P. Markatos,et al.  Using Processor Affinity in Loop Scheduling , 1994 .

[10]  Alvin R. Lebeck,et al.  Power aware page allocation , 2000, SIGP.

[11]  Steven Lucco A Dynamic Scheduling Technique for Irregular Parallel Programs. , 1992, PLDI 1992.

[12]  Rudy Lauwereins,et al.  Energy-Aware Runtime Scheduling for Embedded-Multiprocessor SOCs , 2001, IEEE Des. Test Comput..

[13]  Anoop Gupta,et al.  Scheduling and page migration for multiprocessor compute servers , 1994, ASPLOS VI.

[14]  Hui Li,et al.  Locality and Loop Scheduling on NUMA Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[15]  Monica S. Lam,et al.  Locality Optimizations for Parallel Machines , 1994, CONPAR.

[16]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[17]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[18]  Edith Schonberg,et al.  Factoring: a method for scheduling parallel loops , 1992 .

[19]  L.M. Ni,et al.  Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers , 1993, IEEE Trans. Parallel Distributed Syst..

[20]  Mahmut T. Kandemir,et al.  SPM conscious loop scheduling for embedded chip multiprocessors , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).