Power Consumption in a Real, Commercial Multimedia Core

Peak power and total energy consumption are key factors in the design of embedded microprocessors. Many techniques have been shown to provide great reductions in peak power and/or energy consumption. Unfortunately, several unrealistic assumptions are often made in research studies, especially in regards to multimedia processors. This paper focusses on power reduction is real commercial processors, and how that differs from more abstract research studies. First, such processors often already utilize several power reduction techniques, and these existing optimizations can have a huge impact on the effectiveness of further optimizations. Second, highly optimized production code tends to have significantly less schedule slack and significantly higher density of memory accesses than unoptimized code. Finally, many such studies are done using highlevel simulators, which may not accurately model the power consumption of real microprocessors. In addition, in this study we focus on an embedded, synthesized processor, rather than a high performance custom and hand designed stand-alone microprocessor; a 400MHz synthesized core (the TriMedia TM3270) has significantly different characteristics than a 3GHz Pentium. We carefully analyze the power consumption of the TriMedia TM3270, a commercial product, on both reference benchmark code and optimized code. We use commercial synthesis and simulation tools to obtain a detailed breakdown of where power is consumed. We find that increased functional unit utilization causes significant differences in power consumption between unoptimized and carefully hand-optimized code. We also apply some simple techniques for power savings with no performance degradation, and find that such techniques can greatly change the power profile of a microprocessor. We find that clock gating of individual functional units is vital to keeping the dynamic power low. Finally, we find that synthesizing for the fastest target frequency possible at a given voltage yields the most energy-efficient design.

[1]  Julien Sebot,et al.  SIMD ISA Extensions: Power Efficiency on Multimedia on a Superscalar Processor , 2002 .

[2]  Wei Zhang,et al.  Exploiting VLIW schedule slacks for dynamic and leakage energy reduction , 2001, MICRO.

[3]  Reconfigurable low energy multiplier for multimedia system design , 2000, Proceedings IEEE Computer Society Workshop on VLSI 2000. System Design for a System-on-Chip Era.

[4]  Peter Marwedel,et al.  Analysis of the influence of register file size on energyconsumption, code size, and execution time , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[5]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[6]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[7]  John Arends,et al.  Low-cost branch folding for embedded applications with small tight loops , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[8]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[9]  Alan Jay Smith,et al.  The memory architecture and the cache and memory management unit for the fairchild clipper processor , 1986 .

[10]  Kaushik Roy,et al.  Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, MICRO.

[11]  K. Bernstein,et al.  Scaling, power, and the future of CMOS , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[12]  Israel Koren,et al.  System-level power-aware design techniques in real-time systems , 2003, Proc. IEEE.

[13]  Alan Jay Smith,et al.  Measuring the Performance of Multimedia Instruction Sets , 2002, IEEE Trans. Computers.

[14]  Alan Jay Smith,et al.  Design and characterization of the Berkeley multimedia workload , 2002, Multimedia Systems.

[15]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[16]  Frank Vahid,et al.  Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example , 2002, IEEE Computer Architecture Letters.

[17]  Stamatis Vassiliadis,et al.  Instruction set architecture enhancements for video processing , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[18]  T. N. Vijaykumar,et al.  Reducing register ports for higher speed and lower energy , 2002, MICRO.

[19]  M. Aboelaze,et al.  Predictive Line Buffer: A Fast, Energy Efficient Cache Architecture , 2006, Proceedings of the IEEE SoutheastCon 2006.

[20]  Gerhard Fettweis,et al.  Compiler based exploration of DSP energy savings by SIMD operations , 2004, ASP-DAC.

[21]  Michael L. Scott,et al.  Integrating adaptive on-chip storage structures for reduced dynamic power , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[22]  Jason Flinn,et al.  Quantifying the energy consumption of a pocket computer and a Java virtual machine , 2000, SIGMETRICS '00.

[23]  William H. Mangione-Smith,et al.  Filtering Memory References to Increase Energy Efficiency , 2000, IEEE Trans. Computers.

[24]  Simon Segars ARM7TDMI power consumption , 1997, IEEE Micro.

[25]  Stamatis Vassiliadis,et al.  The TM3270 media-processor , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[26]  David Bearden,et al.  Application-based, transistor-level full-chip power analysis for 700 MHz PowerPC/sup TM/ microprocessor , 2000, Proceedings 2000 International Conference on Computer Design.

[27]  Frank Vahid,et al.  Dynamic loop caching meets preloaded loop caching-a hybrid approach , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.