Combined Application of Data Transfer and Storage Optimizing Transformations and Subword Parallelism Exploitation for Power Consumption and Execution Time Reduction in VLIW Multimedia Processors

In this paper the important issues in mapping data dominated multimedia applications on Very Long Instruction Word (VLIW) multimedia processors are addressed. The main design quality factors of applications realized on the target architecture platform are presented and their interactions are explored. Power consumption is the major cost factor while performance is the overriding constraint in realizations of multimedia applications on the target architecture platform. A methodology for the reduction of the data transfer and storage related power consumption, which forms an important part of the total power budget of the system, and the execution time of applications realized on VLIW multimedia processors, has been developed. The methodology is based on the application of a number of transformations, mainly oriented towards data transfer and storage optimization, to a high level description of the target application. The main focus of this paper is on the interaction of the proposed code transformations with the exploitation of subword parallelism (for example through the application of special performance improving arithmetic subword instructions present in modern VLIW multimedia processors). Experimental results from real-life data-dominated multimedia applications clearly demonstrate that the application of the proposed transformations is orthogonal to the exploitation of subword parallelism. A second conclusion is that the positive impact of the proposed code transformations on performance is typically even larger than the effect of the subword parallelism exploitation for the complete application. The effect of the subword parallelism exploitation is even enhanced after the application of the proposed code transformations.

[1]  Michael Stumm,et al.  Loop and Data Transformations: A Tutorial , 1993 .

[2]  K. Ghose,et al.  Analytical energy dissipation models for low power caches , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[3]  H. De Man,et al.  Power exploration for data dominated video applications , 1996, Proceedings of 1996 International Symposium on Low Power Electronics and Design.

[4]  Vivek Tiwari,et al.  Reducing power in high-performance microprocessors , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[5]  Alexandru Nicolau,et al.  Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration , 1998 .

[6]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[7]  H. De Man,et al.  System level memory optimization for hardware-software co-design , 1997, Proceedings of 5th International Workshop on Hardware/Software Co Design. Codes/CASHE '97.

[8]  Hugo De Man,et al.  Hardware Cache Optimization for Parallel Multimedia Applications , 1998, Euro-Par.

[9]  Noriyuki Suzuki,et al.  A 6-ns 1-Mb CMOS SRAM with latched sense amplifier , 1993 .

[10]  Kaushik Roy Low-power design , 2004, International Symposium on Signals, Circuits and Systems. Proceedings, SCS 2003. (Cat. No.03EX720).

[11]  Konstantinos Konstantinides,et al.  Image and video compression standards , 1995 .

[12]  Hugo De Man,et al.  Architecture-driven synthesis techniques for VLSI implementation of DSP algorithms , 1990, Proc. IEEE.

[13]  Rudolf Eigenmann,et al.  Automatic program parallelization , 1993, Proc. IEEE.

[14]  Niraj K. Jha,et al.  SCALP: an iterative-improvement-based low-power data path synthesis system , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[15]  Paul E. Landman,et al.  Low-power architectural design methodologies , 1995 .

[16]  Hugo De Man,et al.  System-Level Power Optimization of Video Codecs on Embedded Cores: A Systematic Approach , 1998, J. VLSI Signal Process..

[17]  Massoud Pedram,et al.  Low power design methodologies , 1996 .

[18]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[19]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[20]  MalikSharad,et al.  Power analysis of embedded software , 1994 .

[21]  Francky Catthoor Energy-delay efficient data storage and transfer architectures: circuit technology versus design methodology solutions , 1998, Proceedings Design, Automation and Test in Europe.

[22]  Peter Vary,et al.  Digital Speech Signal Processing , 2004 .

[23]  Saman Amarasinghe,et al.  The suif compiler for scalable parallel machines , 1995 .

[24]  William Pugh,et al.  Generating schedules and code within a unified reordering transformation framework , 1992 .

[25]  Ken Kennedy,et al.  The parascope editor: an interactive parallel programming tool , 1993, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[26]  W.F.J. Verhaegh,et al.  Allocation of multiport memories for hierarchical data streams , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[27]  Massoud Pedram,et al.  Power Aware Design Methodologies , 2002 .

[28]  Ken Kennedy,et al.  The parascope editor: an interactive parallel programming tool , 1989, Supercomputing '89.

[29]  D. Kirovski,et al.  System-level Synthesis Of Low-power Hard Real-time Systems , 1997, Proceedings of the 34th Design Automation Conference.

[30]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[31]  Hugo De Man,et al.  Power exploration for data dominated video applications , 1996, ISLPED '96.

[32]  Sun-Yuan Kung,et al.  Implementation of media processors , 1997 .

[33]  Teresa H. Meng,et al.  Portable video-on-demand in wireless communication , 1995, Proc. IEEE.

[34]  Konstantinos Konstantinides,et al.  Image and Video Compression Standards: Algorithms and Architectures , 1997 .