Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling

As clock frequency increases and feature size decreases, clock distribution and wire delays present a growing challenge to the designers of singly-clocked, globally synchronous systems. We describe an alternative approach, which we call a multiple clock domain (MCD) processor, in which the chip is divided into several clock domains, within which independent voltage and frequency scaling can be performed. Boundaries between domains are chosen to exploit existing queues, thereby minimizing inter-domain synchronization costs. We propose four clock domains, corresponding to the front end , integer units, floating point units, and load-store units. We evaluate this design using a simulation infrastructure based on SimpleScalar and Wattch. In an attempt to quantify potential energy savings independent of any particular on-line control strategy, we use off-line analysis of traces from a single-speed run of each of our benchmark applications to identify profitable reconfiguration points for a subsequent dynamic scaling run. Using applications from the MediaBench, Olden, and SPEC2000 benchmark suites, we obtain an average energy-delay product improvement of 20% with MCD compared to a modest 3% savings from voltage scaling a single clock and voltage system.

[1]  Diana Marculescu On the Use of Microarchitecture-Driven Dynamic Voltage Scaling , 2000 .

[2]  B. Chappell The fine art of IC design , 1999 .

[3]  Christopher J. Hughes,et al.  Saving energy with architectural and frequency adaptations for multimedia applications , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[4]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[5]  Ruben W. Castelino,et al.  Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..

[6]  Proceedings Eighth International Symposium on High Performance Computer Architecture , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[7]  P. Glaskowski Pentium 4 (partially) previewed , 2000 .

[8]  Luca Benini,et al.  Monitoring system activity for OS-directed dynamic power management , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[9]  Gary Lauterbach,et al.  UltraSPARC-III: designing third-generation 64-bit performance , 1999, IEEE Micro.

[10]  Frank Bellosa OS-Directed Throttling of Processor Activity for Dynamic Power Management , 1999 .

[11]  Luis F. G. Sarmenta,et al.  Rational clocking [digital systems design] , 1995, Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors.

[12]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[13]  Tao Li,et al.  Instruction Balance, Energy Consumption and Program Performance , 2001 .

[14]  Thomas D. Burd,et al.  The simulation and evaluation of dynamic voltage scaling algorithms , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[15]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[16]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[17]  Rastislav Bodík,et al.  Focusing processor policies via critical-path prediction , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[18]  Hal Wasserman,et al.  Comparing algorithm for dynamic speed-setting of a low-power CPU , 1995, MobiCom '95.

[19]  Rami Melhem,et al.  Adapting Processor Supply Voltage to Instruction-Level Parallelism , 2001 .

[20]  Frank Bellosa,et al.  The benefits of event: driven energy accounting in power-sensitive systems , 2000, ACM SIGOPS European Workshop.

[21]  Doug Matzke,et al.  Will Physical Scalability Sabotage Performance Gains? , 1997, Computer.

[22]  Gary S. Tyson,et al.  Evaluating Design Tradeoffs in Dual Speed Pipelines , 2001 .

[23]  S SohiGurindar Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .

[24]  Chris J. Myers,et al.  Interfacing synchronous and asynchronous modules within a high-speed pipeline , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[25]  Marc Fleischmann CrusoeTM Power Management Cutting x86 Operating Power Through LongRunTM , 2000 .