Energy Minimization and Latency Hiding for Heterogeneous Parallel Memory

Many high-performance DSP processors employ multi-module on-chip memory to improve performance and power consumption. This paper studies the scheduling and assignment problem that minimizes the total energy while satisfying performance for applications with loops. An algorithm, LSAMEM (Loop Scheduling and Assignment to Minimize Energy for Memory), is proposed. The algorithm attempts to maximum energy saving while satisfying timing constraint with guaranteed probability. The experimental results show that the average improvement on energy-saving is significant by using LSAMEM.

[1]  Yunheung Paek,et al.  Efficient register and memory assignment for non-orthogonal architectures via graph coloring and MST algorithms , 2002, LCTES/SCOPES '02.

[2]  Meikang Qiu,et al.  Energy Minimization with Soft Real-time and DVS for Uniprocessor and Multiprocessor Embedded Systems , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[3]  Mahmut T. Kandemir,et al.  Hardware and Software Techniques for Controlling DRAM Power Modes , 2001, IEEE Trans. Computers.

[4]  Meikang Qiu,et al.  Heterogeneous Parallel Embedded Systems: Time and Power Optimization , 2008 .

[5]  Meikang Qiu,et al.  Dynamic and Leakage Energy Minimization With Soft Real-Time Loop Scheduling and Voltage Assignment , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Meikang Qiu,et al.  Rotation Scheduling and Voltage Assignment to Minimize Energy for SoC , 2009, 2009 International Conference on Computational Science and Engineering.

[7]  Dsp Division,et al.  DSP56600 16-bit Digital Signal Processor Family Manual , 1996 .

[8]  Meikang Qiu,et al.  Efficent Algorithm of Energy Minimization for Heterogeneous Wireless Sensor Network , 2006, EUC.

[9]  Rainer Leupers,et al.  Optimized address assignment for DSPs with SIMD memory accesses , 2001, ASP-DAC '01.

[10]  Edwin Hsing-Mean Sha,et al.  Optimizing Overall Loop Schedules Using Prefetching and Partitioning , 2000, IEEE Trans. Parallel Distributed Syst..

[11]  Edwin Hsing-Mean Sha,et al.  Efficient variable partitioning and scheduling for DSP processors with multiple memory modules , 2004, IEEE Transactions on Signal Processing.

[12]  Alvin R. Lebeck,et al.  Power aware page allocation , 2000, SIGP.

[13]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[14]  Meikang Qiu,et al.  ILP optimal scheduling for multi-module memory , 2009, CODES+ISSS '09.

[15]  Larry Carter,et al.  Scheduling strategies for master-slave tasking on heterogeneous processor platforms , 2004, IEEE Transactions on Parallel and Distributed Systems.

[16]  Sharad Malik,et al.  Simultaneous reference allocation in code generation for dual data memory bank ASIPs , 2000, TODE.

[17]  Meikang Qiu,et al.  Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems , 2009, TODE.

[18]  Mahmut T. Kandemir,et al.  The design and use of simplePower: a cycle-accurate energy estimation tool , 2000, Proceedings 37th Design Automation Conference.

[19]  Rainer Leupers,et al.  Variable partitioning for dual memory bank DSPs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[20]  Edwin Hsing-Mean Sha,et al.  Rotation Scheduling: A Loop Pipelining Algorithm , 1993, 30th ACM/IEEE Design Automation Conference.

[21]  Xiaobo Sharon Hu,et al.  Energy-aware variable partitioning and instruction scheduling for multibank memory architectures , 2005, TODE.

[22]  Edwin Hsing-Mean Sha,et al.  Scheduling of uniform multidimensional systems under resource constraints , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[23]  Mahmut T. Kandemir,et al.  Instruction Scheduling for Low Power , 2004, J. VLSI Signal Process..

[24]  Minyi Guo,et al.  Loop scheduling and bank type assignment for heterogeneous multi-bank memory , 2009, J. Parallel Distributed Comput..

[25]  Meikang Qiu,et al.  Energy minimization for heterogeneous wireless sensor networks , 2009, J. Embed. Comput..

[26]  M. Kandemir,et al.  Automatic data migration for reducing energy consumption in multi-bank memory systems , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).

[27]  Meikang Qiu,et al.  Heterogeneous real-time embedded software optimization considering hardware platform , 2009, SAC '09.

[28]  Michael Franz,et al.  Power reduction techniques for microprocessor systems , 2005, CSUR.

[29]  Meikang Qiu,et al.  A Discrete Dynamic Voltage and Frequency Scaling Algorithm Based on Task Graph Unrolling for Multiprocessor System , 2009, 2009 International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded Computing.

[30]  Edwin Hsing-Mean Sha,et al.  Loop scheduling and partitions for hiding memory latencies , 1999, Proceedings 12th International Symposium on System Synthesis.

[31]  Atakan Dogan,et al.  Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[32]  Paul Chow,et al.  Exploiting dual data-memory banks in digital signal processors , 1996, ASPLOS VII.

[33]  Hugo De Man,et al.  Minimizing the required memory bandwidth in VLSI system realizations , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[34]  Edwin Hsing-Mean Sha,et al.  Efficient assignment and scheduling for heterogeneous DSP systems , 2005, IEEE Transactions on Parallel and Distributed Systems.

[35]  Meikang Qiu,et al.  Voltage Assignment with Guaranteed Probability Satisfying Timing Constraint for Real-time Multiproceesor DSP , 2007, J. VLSI Signal Process..