Improving workload balance and code optimization on processor-in-memory systems

Processor-in-memory (PIM) architectures have recently been proposed, with the objective of reducing the performance gap between processor and memory. An earlier study of Huang and Chu [Proceedings of 2nd Workshop on Intelligent Memory Systems, Cambridge, MA, 2000] designed a statement-based parallelizing system, SAGE, to exploit the potential benefits of PIM. This study extends this system to achieve better performance. Several comprehensive optimization approaches, including self-patch weight evaluation, loop splitting for PIM, intelligent memory operation (IMOP) recognition, and tiling for PIM, are devised to produce execution schedules with improved load balance. Experimental results confirm the effectiveness of the proposed method.

[1]  Seung-Moon Yoo,et al.  FlexRAM: toward an advanced intelligent memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[2]  Tsung-Chuan Huang,et al.  A new analyzing approach for intelligent memory systems , 2001, Computers and Their Applications.

[3]  W. K. George,et al.  University of Illinois at Urbana-Champain , 1997 .

[4]  Csaba Andras Moritz,et al.  FlexCache: A Framework for Flexible Compiler Generated Data Caching , 2000, Intelligent Memory Systems.

[5]  Ko-Yang Wang Precise compile-time performance prediction for superscalar-based computers , 1994, PLDI '94.

[6]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[7]  Tsung-Chuan Huang,et al.  SAGE: A New Analysis and Optimization System for FlexRAM Architecture , 2000, Intelligent Memory Systems.

[8]  Rajesh K. Gupta,et al.  Adapting cache line size to application behavior , 1999, ICS '99.

[9]  William H. Press,et al.  Numerical Recipes: FORTRAN , 1988 .

[10]  Christoforos E. Kozyrakis,et al.  Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler , 2000, Intelligent Memory Systems.

[11]  Frederic T. Chong,et al.  Active pages: a computation model for intelligent memory , 1998, ISCA.

[12]  Steve Carr,et al.  Combining optimization for cache and instruction-level parallelism , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[13]  M. Castells Multilevel tiling for non-rectangular interation spaces , 1999 .

[14]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[15]  David J. Kuck,et al.  A Survey of Parallel Machine Organization and Programming , 1977, CSUR.