Runtime code parallelization for on-chip multiprocessors

Chip multiprocessing (or multiprocessor system-on-a-chip) is a technique that combines two or more processor cores on a single piece of silicon to enhance computing performance. An important problem to be addressed in executing applications on an on-chip multiprocessor environment is to select the most suitable number of processors to use for a given objective function (e.g., minimizing execution time or energy-delay product) under multiple constraints. Previous research proposed an ILP-based solution to this problem that is based on exhaustive evaluation of each nest under all possible processor sizes. In this paper, we take a different approach and propose a pure runtime strategy for determining the best number of processors to use at runtime. This approach is more general than static techniques and can be applicable in situations where the latter cannot be.

[1]  Diana Marculescu,et al.  Power aware microarchitecture resource scaling , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[2]  William J. Bowhill,et al.  Design of High-Performance Microprocessor Circuits , 2001 .

[3]  S. Turner,et al.  Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[4]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[5]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[6]  Mahmut T. Kandemir,et al.  The design and use of simplePower: a cycle-accurate energy estimation tool , 2000, Proceedings 37th Design Automation Conference.

[7]  Luca Benini,et al.  System-level power optimization: techniques and tools , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[8]  Mahmut T. Kandemir,et al.  An integer linear programming based approach for parallelizing applications in On-chip multiprocessors , 2002, DAC '02.

[9]  Kathryn S. McKinley,et al.  Automatic and interactive parallelization , 1992 .

[10]  Grant Martin,et al.  System-on-Chip design , 2001, ASICON 2001. 2001 4th International Conference on ASIC Proceedings (Cat. No.01TH8549).

[11]  Rajeev Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.

[12]  Tughrul Arslan,et al.  Proceedings Design, Automation and Test in Europe Conference and Exhibition , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.