Loop scheduling and bank type assignment for heterogeneous multi-bank memory

Many high-performance DSP processors employ multi-bank on-chip memory to improve performance and energy consumption. This architectural feature supports higher memory bandwidth by allowing multiple data memory accesses to be executed in parallel. However, making effective use of multi-bank memory remains difficult, considering the combined effect of performance and energy requirement. This paper studies the scheduling and assignment problem about how to minimize the total energy consumption while satisfying the timing constraint with heterogeneous multi-bank memory for applications with loop. An algorithm, TASL (Type Assignment and Scheduling for Loops), is proposed. The algorithm uses bank type assignment with the consideration of variable partition to find the best configuration for both memory and ALU. The experimental results show that the average improvement on energy-saving is significant by using TASL.

[1]  Michael Franz,et al.  Power reduction techniques for microprocessor systems , 2005, CSUR.

[2]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[3]  Charles E. Leiserson,et al.  Retiming synchronous circuitry , 1988, Algorithmica.

[4]  Larry Carter,et al.  Scheduling strategies for master-slave tasking on heterogeneous processor platforms , 2004, IEEE Transactions on Parallel and Distributed Systems.

[5]  Edwin Hsing-Mean Sha,et al.  Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching , 2001, J. VLSI Signal Process..

[6]  Xiaobo Sharon Hu,et al.  Task scheduling and voltage selection for energy minimization , 2002, DAC '02.

[7]  Rami G. Melhem,et al.  Dynamic and aggressive scheduling techniques for power-aware real-time systems , 2001, Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001) (Cat. No.01PR1420).

[8]  Edwin Hsing-Mean Sha,et al.  Efficient assignment and scheduling for heterogeneous DSP systems , 2005, IEEE Transactions on Parallel and Distributed Systems.

[9]  Tei-Wei Kuo,et al.  Multiprocessor energy-efficient scheduling for real-time tasks with different power characteristics , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[10]  Paul Chow,et al.  Exploiting dual data-memory banks in digital signal processors , 1996, ASPLOS VII.

[11]  Hugo De Man,et al.  Minimizing the required memory bandwidth in VLSI system realizations , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[12]  Atakan Dogan,et al.  Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[13]  Dsp Division,et al.  DSP56600 16-bit Digital Signal Processor Family Manual , 1996 .

[14]  Junqiang Sun,et al.  Tms320c6000 cpu and instruction set reference guide , 2000 .

[15]  Edwin Hsing-Mean Sha,et al.  Optimal two level partitioning and loop scheduling for hiding memory latency for DSP applications , 2000, Proceedings 37th Design Automation Conference.

[16]  Taewhan Kim,et al.  Memory access scheduling and binding considering energy minimization in multi-bank memory systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[17]  Yunheung Paek,et al.  Efficient register and memory assignment for non-orthogonal architectures via graph coloring and MST algorithms , 2002, LCTES/SCOPES '02.

[18]  Xiaobo Sharon Hu,et al.  Energy-aware variable partitioning and instruction scheduling for multibank memory architectures , 2005, TODE.

[19]  Keshab K. Parhi,et al.  Register minimization in cost-optimal synthesis of DSP architectures , 1995, VLSI Signal Processing, VIII.

[20]  Mahmut T. Kandemir,et al.  Automatic data migration for reducing energy consumption in multi-bank memory systems , 2002, DAC '02.

[21]  Y. Shimazaki,et al.  A shared-well dual-supply-voltage 64-bit ALU , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[22]  Sharad Malik,et al.  Code optimization libraries for retargetable compilation for embedded digital signal processors , 1998 .

[23]  Rainer Leupers,et al.  Optimized address assignment for DSPs with SIMD memory accesses , 2001, ASP-DAC '01.

[24]  Alvin R. Lebeck,et al.  Power aware page allocation , 2000, SIGP.

[25]  Sharad Malik,et al.  Simultaneous reference allocation in code generation for dual data memory bank ASIPs , 2000, TODE.

[26]  Mahmut T. Kandemir,et al.  The design and use of simplePower: a cycle-accurate energy estimation tool , 2000, Proceedings 37th Design Automation Conference.

[27]  Edwin Hsing-Mean Sha,et al.  Efficient variable partitioning and scheduling for DSP processors with multiple memory modules , 2004, IEEE Transactions on Signal Processing.

[28]  Soontae Kim Reducing ALU and Register File Energy by Dynamic Zero Detection , 2007, 2007 IEEE International Performance, Computing, and Communications Conference.

[29]  Edwin Hsing-Mean Sha,et al.  Rotation Scheduling: A Loop Pipelining Algorithm , 1993, 30th ACM/IEEE Design Automation Conference.

[30]  Niraj K. Jha,et al.  Static and dynamic variable voltage scheduling algorithms for real-time heterogeneous distributed embedded systems , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[31]  Rainer Leupers,et al.  Variable partitioning for dual memory bank DSPs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[32]  Mahmut T. Kandemir,et al.  Hardware and Software Techniques for Controlling DRAM Power Modes , 2001, IEEE Trans. Computers.

[33]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[34]  Mahmut T. Kandemir,et al.  Instruction Scheduling for Low Power , 2004, J. VLSI Signal Process..

[35]  Meikang Qiu,et al.  Loop scheduling to minimize cost with data mining and prefetching for heterogeneous DSP , 2006 .

[36]  Edwin Hsing-Mean Sha,et al.  Static scheduling for synthesis of DSP algorithms on various models , 1995, J. VLSI Signal Process..

[37]  Keshab K. Parhi,et al.  ILP-based cost-optimal DSP synthesis with module selection and data format conversion , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[38]  Viktor K. Prasanna,et al.  Power-aware resource allocation for independent tasks in heterogeneous real-time systems , 2002, Ninth International Conference on Parallel and Distributed Systems, 2002. Proceedings..

[39]  Keshab K. Parhi,et al.  Resource-constrained loop list scheduler for DSP algorithms , 1995, J. VLSI Signal Process..