Simd isa extensions: tradeoff between power consumption and performance on a superscalar processor

Preface The limits imposed by power consumption are becoming an issue in most areas of computing. The need to limit power consumption is readily apparent in the case of portable and mobile computer platforms — the laptop and the cell phone being the most common examples. But the need to limit power in other computer settings is becoming important too. This workshop is the third in a series of workshops and tutorials designed to bring together researchers in the area of architecture-power trade-offs and provide a forum for them to discuss their preliminary ideas. It helped raise the awareness of the architecture community to concerns about power issues. This workshop was followed by a tutorial, the Cool Chips Tutorial at MICRO32 in Haifa, Israel. It assembled a group of speakers from leading microprocessor companies to give presentations on what they consider to be their critical low power issues now and in the future, and some possible solutions to these problems. Abstract This paper discusses early results from a project called Morph where the goal is to develop a microarchitecture that can adapt its intrinsic performance dynamically. By observing that power dissipation is, to a first approximation , proportional to a power of the performance, reducing the performance characteristics thus reduces the power even faster, resulting in greatly improved performance/ watt characteristics. These techniques are largely independent of other techniques, such as voltage scaling, and as such add an extra runtime " gear " which a real embedded system may manipulate. In addition to the microarchitecture work, the paper also briefly discusses the driving application to be used in its evaluation, namely planetary rovers, and a discussion of some of the runtime software considerations that will be necessary to work through to make the " gear-changing " a usable piece of technology to real applications. 1.0 Introduction Many embedded systems go through long periods of time where only a low level of compute performance is needed, followed by periods where performance needs peak. Current high performance microprocessors, however, are designed to meet these peak needs with microarchitectures that have fixed characteristics , and which provide much lower MIPS per watt measures than simpler, but less powerful CPUs. Thus, ignoring the potential from voltage scaling, the only " knob " that an embedded system can twist to lower power in less performance demanding time periods is clock scaling. Unfortunately, lowering the clock …

[1]  Wolfgang Rosenstiel,et al.  System Level Design Using the SystemC Modeling Platform , 2001 .

[2]  M. Merten,et al.  A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[3]  David H. Albonesi,et al.  The Inherent Energy Efficiency of Complexity-Adaptive Processors , 1998 .

[4]  Kanad Ghose,et al.  Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[5]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[6]  Bharadwaj Amrutur,et al.  A replica technique for wordline and sense control in low-power SRAM's , 1998, IEEE J. Solid State Circuits.

[7]  Alvin M. Despain,et al.  Cache design trade-offs for power and performance optimization: a case study , 1995, ISLPED '95.

[8]  Margaret Martonosi,et al.  Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[9]  Daniel Moss,et al.  Compiler-assisted dynamic power-aware scheduling for real-time applications , 2000 .

[10]  Chi-Ying Tsui,et al.  Saving power in the control path of embedded processors , 1994, IEEE Design & Test of Computers.

[11]  Norman P. Jouppi,et al.  Quantifying the Complexity of Superscalar Processors , 2002 .

[12]  Jan M. Rabaey,et al.  Activity-sensitive architectural power analysis for the control path , 1995, ISLPED '95.

[13]  Ikuya Kawasaki,et al.  SH3: high code density, low power , 1995, IEEE Micro.

[14]  Sumedh W. Sathaye,et al.  System-level power consumption modeling and tradeoff analysis techniques for superscalar processor design , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[15]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[16]  Anantha P. Chandrakasan,et al.  Low Power Digital CMOS Design , 1995 .

[17]  Bharadwaj Amrutur,et al.  Techniques to reduce power in fast wide memories [CMOS SRAMs] , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[18]  Miodrag Potkonjak,et al.  On-line scheduling of hard real-time tasks on variable voltage processor , 1998, ICCAD.

[19]  Dake Liu,et al.  Power consumption estimation in CMOS VLSI chips , 1994, IEEE J. Solid State Circuits.

[20]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[21]  Diana Marculescu,et al.  Power Efficient Processors Using Multiple Supply Voltages , 2000 .

[22]  Marc Fleischmann CrusoeTM Power Management Cutting x86 Operating Power Through LongRunTM , 2000 .

[23]  Uming Ko,et al.  Energy optimization of multilevel cache architectures for RISC and CISC processors , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[24]  Enric MusollCyrix,et al.  Reducing the Energy of Address and Data Buses with the Working-zone Encoding Technique and Its Eeect on Multimedia Applications , 1998 .

[25]  Ibrahim N. Hajj,et al.  Using dynamic cache management techniques to reduce energy in a high-performance processor , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[26]  M.A. Horowitz,et al.  Speed and power scaling of SRAM's , 2000, IEEE Journal of Solid-State Circuits.

[27]  Mircea R. Stan,et al.  Bus-invert coding for low-power I/O , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[28]  Vojin G. Oklobdzija Architectural Tradeoffs for Low Power , 1998 .

[29]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[30]  Miodrag Potkonjak,et al.  Synthesis techniques for low-power hard real-time systems on variable voltage processors , 1998, Proceedings 19th IEEE Real-Time Systems Symposium (Cat. No.98CB36279).

[31]  A. Varadharajan,et al.  A low-cost 300 MHz RISC CPU with attached media processor , 1998, 1998 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, ISSCC. First Edition (Cat. No.98CH36156).

[32]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[33]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[34]  Srilatha Manne,et al.  Power and performance tradeoffs using various caching strategies , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[35]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[36]  Mark C. Toburen,et al.  Scheduling for Low Power Dissipation in High Performance , 1998 .

[37]  M. Horowitz,et al.  Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[38]  K. Ghose Reducing energy requirements for instruction issue and dispatch in superscalar microprocessors , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[39]  Krste Asanovic,et al.  Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[40]  Trevor Pering,et al.  Dynamic Voltage Scaling and the Design of a Low-Power Microprocessor System , 1998 .

[41]  Mary Jane Irwin,et al.  Energy characterization based on clustering , 1996, DAC '96.

[42]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[43]  R. Allmon,et al.  A 300 MHz 64 b quad-issue CMOS RISC microprocessor , 1995, Proceedings ISSCC '95 - International Solid-State Circuits Conference.

[44]  Luca Benini,et al.  System-level power optimization: techniques and tools , 1999, ISLPED '99.

[45]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[46]  Bruce R. Childers,et al.  Reordering Memory Bus Transactions for Reduced Power Consumption , 2000, LCTES.

[47]  Kanad Ghose,et al.  ENERGY EFFICIENT CACHE ORGANIZATIONS FOR SUPERSCALAR PROCESSORS , 1998 .