Empirical Study for Optimization of Power-Performance with On-Chip Memory

Power-performance (performance per uniform power consumption) recently has become a more important factor in modern highperformance microprocessors. In processor design, it is a well-known that off-chip memory access has a large impact on both performance and power consumption. On-chip memory is one solution for this problem, so that many processors such as the Renesas SH-4 and some ARM architecture type processors adopt on-chip memory, which resides on the same layer as the cache memory. In this study, the effectiveness of the on-chip memory in an SH-4 processor was quantitatively examined by directly measuring the real power of the processor. For these experiments, we proposed a method that made use of the on-chip memory for power reduction. The experimental results show that the optimization of data transfer using on-chip memory reduces EDP(energy delay product) by up to 15.2%. As an extension of on-chip memory, we have proposed an on-chip RAM architecture called SCIMA (software controllable integrated memory architecture) which enables DMA (direct memory access) transfer to the on-chip memory. According to the empirical data from the SH-4 processor, it was found that the additional DMA transfer using SCIMA reduces EDP by up to 26.3%.

[1]  Hiroshi Nakamura,et al.  Software-controlled on-chip memory for high-performance and low-power computing , 2002, CARN.

[2]  Peter M. Kogge,et al.  A parallel processing chip with embedded DRAM macros , 1996, IEEE J. Solid State Circuits.

[3]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[4]  José E. Moreira,et al.  Unlocking the Performance of the BlueGene/L Supercomputer , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[5]  Katherine Yelick,et al.  A Case for Intelligent RAM: IRAM , 1997 .

[6]  Hiroshi Nakamura,et al.  SCIMA: Software controlled integrated memory architecture for high performance computing , 2000, Proceedings 2000 International Conference on Computer Design.

[7]  Hiroshi Nakamura,et al.  Data movement optimization for software-controlled on-chip memory , 2004, Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004..

[8]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[9]  Masaki Kondo,et al.  Reducing Memory System Energy by Software-Controlled On-Chip Memory , 2003 .

[10]  Daisuke Takahashi Efficient implementation of parallel three-dimensional FFT on clusters of PCs , 2003 .

[11]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[12]  Michael E. Wolf,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[13]  Sony’s Emotionally Charged Chip , 1999 .