Increasing on-chip memory space utilization for embedded chip multiprocessors through data compression

Minimizing the number of off-chip memory references is very important in chip multiprocessors from both the performance and power perspectives. To achieve this the distance between successive reuses of the same data block must be reduced. However, this may not be possible in many cases due to data dependences between computations assigned to different processors. This paper focuses on software-managed on-chip memory space utilization for embedded chip multiprocessors and proposes a compression-based approach to reduce the memory space occupied by data blocks with large inter-processor reuse distances. The proposed approach has two major components: a compiler and an ILP (integer linear programming) solver. The compiler's job is to analyze the application code and extract information on data access patterns. This access pattern information is then passed to our ILP solver, which determines the data blocks to compress/decompress and the times (the program points) at which to compress/decompress them. We tested the effectiveness of this ILP based approach using access patterns extracted by our compiler from application codes. Our experimental results reveal that the proposed approach is very effective in reducing power consumption. Moreover, it leads to a lower energy consumption than an alternate scheme evaluated in our experiments for all the test cases studied.

[1]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988 .

[2]  Michael E. Wolf,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[3]  Wei Li,et al.  Compiling for NUMA Parallel Machines , 1993 .

[4]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[5]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[6]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[7]  Josep Torrellas,et al.  A Chip-Multiprocessor Architecture with , 1999 .

[8]  Santosh Pande,et al.  A Framework for Loop Distribution on Limited On-Chip Memory Processors , 2000, CC.

[9]  Santosh Pande,et al.  Optimizing On-Chip Memory Usage Through Loop Restructuring for Embedded Processors , 2000 .

[10]  Ahmed Amine Jerraya,et al.  An optimal memory allocation for application-specific multiprocessor system-on-chip , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[11]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[12]  Luca Benini,et al.  Hardware-assisted data compression for energy minimization in systems with embedded processors , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[13]  F. Gharsalli,et al.  Automatic generation of embedded memory wrapper for multiprocessor SoC , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).

[14]  Rajiv Gupta,et al.  Enabling partial cache line prefetching through data compression , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[15]  Cheng Wang,et al.  Impact of data compression on energy consumption of wireless-networked handheld devices , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[16]  Mo Chen,et al.  The Importance of Data Compression for Energy Efficiency in Sensor Networks , 2003 .

[17]  Enrico Macii,et al.  A new algorithm for energy-driven data compression in VLIW embedded processors , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[18]  Mahmut T. Kandemir,et al.  Using Data Compression to Increase Energy Savings in Multi-bank Memories , 2004, Euro-Par.

[19]  Mahmut T. Kandemir,et al.  Data compression for improving SPM behavior , 2004, Proceedings. 41st Design Automation Conference, 2004..

[20]  Keshav Pingali,et al.  A singular loop transformation framework based on non-singular matrices , 1992, International Journal of Parallel Programming.