Behavioral-Level Performance and Power Exploration of Data-Intensive Applications Mapped on Programmable Processors

The continuous increase of the computational power of programmable processors has established them as an attractive design alternative, for implementation of the most computationally intensive applications, like video compression. To enforce this trend, designers implementing applications on programmable platforms have to be provided with reliable and in-depth data and instruction analysis that will allow for the early selection of the most appropriate application for a given set of specifications. To address this need, we introduce a new methodology for early and accurate estimation of the number of instructions required for the execution of an application, together with the number of data memory transfers on a programmable processor. The high-level estimation is achieved by a series of mathematical formulas; these describe not only the arithmetic operations of an application, but also its control and addressing operations, if it is executed on a programmable core. The comparative study, which is done using three popular processors (ARM, MIPS, and Pentium), shows the high efficiency and accuracy of the methodology proposed, in terms of the number of executed (micro-)instructions (i.e. performance) and the number of data memory transfers (i.e. memory power consumption). Using the proposed methodology we estimated an average deviation of 23% in our estimated figures compared with the measurements taken from the real execution on the CPUs.

[1]  Antonios Argyriou,et al.  Data-Reuse and Parallel Embedded Architectures for Low-Power, Real-Time Multimedia Applications , 2000, PATMOS.

[2]  Erik Brockmeyer,et al.  Data and memory optimization techniques for embedded systems , 2001, TODE.

[3]  Hugo De Man,et al.  Formalized methodology for data reuse: exploration for low-power hierarchical memory mappings , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[4]  Antonios Argyriou,et al.  Power and performance exploration of embedded systems executing multimedia kernels , 2002 .

[5]  H. De Man,et al.  Global communication and memory optimizing transformations for low power signal processing systems , 1994, Proceedings of 1994 IEEE Workshop on VLSI Signal Processing.

[6]  Peter Pirsch,et al.  VLSI architectures for video compression-a survey , 1995, Proc. IEEE.

[7]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[8]  Massoud Pedram,et al.  Power Aware Design Methodologies , 2002 .

[9]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[10]  Mahmut T. Kandemir,et al.  EAC: a compiler framework for high-level energy estimation and optimization , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[11]  Francky Catthoor,et al.  Custom Memory Management Methodology , 1998, Springer US.

[12]  Gauthier Lafruit,et al.  The Local Wavelet Transform: a memory-efficient, high-speed architecture optimized to a Region-Oriented Zero-Tree coder , 2000, Integr. Comput. Aided Eng..

[13]  Peter Kuhn,et al.  Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation , 1999, Springer US.

[14]  Anil K. Jain,et al.  Displacement Measurement and Its Application in Interframe Image Coding , 1981, IEEE Trans. Commun..

[15]  Antonios Argyriou,et al.  A memory management approach for efficient implementation of multimedia kernels on programmable architectures , 2001, Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems.

[16]  Konstantinos Konstantinides,et al.  Image and Video Compression Standards: Algorithms and Architectures , 1997 .

[17]  Hugo De Man,et al.  Platform Independent Data Transfer and Storage Exploration Illustrated on Parallel Cavity Detection Algorithm , 1999, PDPTA.

[18]  Massoud Pedram,et al.  Low power design methodologies , 1996 .

[19]  Young Serk Shim,et al.  A fast hierarchical motion vector estimation algorithm using mean pyramid , 1995, IEEE Trans. Circuits Syst. Video Technol..

[20]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[21]  Hiroshi Takeno,et al.  Video DSP architecture for MPEG2 codec , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Erik Brockmeyer,et al.  Data Access and Storage Management for Embedded Programmable Processors , 2002, Springer US.

[23]  Mahmut T. Kandemir,et al.  Reducing memory requirements of nested loops for embedded systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).