Power efficiency of voltage scaling in multiple clock, multiple voltage cores

Due to increasing clock speeds, increasing design sizes and shrinking technologies, it is becoming more and more challenging to distribute a single global clock throughout a chip. In this paper we study the effect of using a Globally Asynchronous Locally Synchronous (GALS) organization for a superscalar, out-of-order processor, both in terms of power and performance. To this end, we propose a novel modeling and simulation environment for multiple clock cores with static or dynamically variable voltages for each synchronous block. Using this design exploration environment we were able to assess the power/performance tradeoffs available for Multiple Clock, Single Voltage (MCSV), as well as Multiple Clock, Dynamic Voltage (MCDV) cores. Our results show that MCSV processors are 10% more power efficient when compared to single-clock single voltage designs with a performance penalty of about 10%. By exploiting the flexibility of independent dynamic voltage scaling the various clock domains, the power efficiency of GALS designs can be improved by 12% on average, and up to 20% more in select cases. The power efficiency of MCDV cores becomes comparable with the one of Single Clock, Dynamic Voltage (SCDV) cores, while being up to 8% better in some cases. Our results show that MCDV cores consume 22% less power at an average 12% performance loss.

[1]  Ivan E. Sutherland,et al.  Micropipelines , 1989, Commun. ACM.

[2]  Johnny Öberg,et al.  Lowering power consumption in clock by using globally asynchronous locally synchronous design style , 1999, DAC '99.

[3]  Chenming Hu,et al.  Performance and V/sub dd/ scaling in deep submicrometer CMOS , 1998 .

[4]  H. Fair,et al.  Clocking design and analysis for a 600 MHz Alpha microprocessor , 1998, 1998 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, ISSCC. First Edition (Cat. No.98CH36156).

[5]  Hiroto Yasuura,et al.  Voltage scheduling problem for dynamically variable voltage processors , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[6]  Avi Mendelson,et al.  Coming challenges in microarchitecture and architecture , 2001, Proc. IEEE.

[7]  K.A. Jenkins,et al.  A clock distribution network for microprocessors , 2000, 2000 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.00CH37103).

[8]  L. S. Nielsen,et al.  Low-power operation using self-timed circuits and adaptive scaling of the supply voltage , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[9]  Steven M. Nowick,et al.  A low-latency FIFO for mixed-clock systems , 2000, Proceedings IEEE Computer Society Workshop on VLSI 2000. System Design for a System-on-Chip Era.

[10]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[11]  Miodrag Potkonjak,et al.  Energy minimization of system pipelines using multiple voltages , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[12]  Peter Robinson,et al.  Self calibrating clocks for globally asynchronous locally synchronous systems , 2000, Proceedings 2000 International Conference on Computer Design.

[13]  Mark Horowitz,et al.  Clustered voltage scaling technique for low-power design , 1995, ISLPED '95.

[14]  Thomas D. Burd,et al.  Design issues for Dynamic Voltage Scaling , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[15]  Daniel Marcos Chapiro,et al.  Globally-asynchronous locally-synchronous systems , 1985 .

[16]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[17]  Wolfgang Fichtner,et al.  Practical design of globally-asynchronous locally-synchronous systems , 2000, Proceedings Sixth International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC 2000) (Cat. No. PR00586).

[18]  Jim D. Garside,et al.  AMULET3: a 100 MIPS asynchronous embedded processor , 2000, Proceedings 2000 International Conference on Computer Design.

[19]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[20]  Kenneth Y. Yun,et al.  Pausible clocking-based heterogeneous systems , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[21]  R. Allmon,et al.  High-performance microprocessor design , 1998, IEEE J. Solid State Circuits.