Frontend frequency-voltage adaptation for optimal energy-delay/sup 2/

In this paper, we present a clustered, multiple-clock domain (CMCD) microarchitecture that combines the benefits of both clustering and globally asynchronous locally synchronous (GALS) designs. We also present a mechanism for dynamically adapting the frequency and voltage of the frontend of the CMCD with the goal to optimize the energy-delay/sup 2/ product (ED2P). Our mechanism has minimal hardware cost, is entirely self-adjustable, does not depend on any thresholds, and achieves results close to optimal. We evaluate it on 16 SPEC 2000 applications and report 17.5% ED2P reduction on average (80% of the upper bound).

[1]  Michael L. Scott,et al.  Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor , 2003, ISCA '03.

[2]  Diana Marculescu,et al.  Power and performance evaluation of globally asynchronous locally synchronous processors , 2002, ISCA.

[3]  K.A. Jenkins,et al.  A clock distribution network for microprocessors , 2000, 2000 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.00CH37103).

[4]  Sandhya Dwarkadas,et al.  Dynamic frequency and voltage control for a multiple clock domain microarchitecture , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[5]  H AlbonesiDavid,et al.  Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor , 2003 .

[6]  José González,et al.  Dynamic cluster resizing , 2003, Proceedings 21st International Conference on Computer Design.

[7]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[8]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[9]  Ramon Canal,et al.  Dynamic cluster assignment mechanisms , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[10]  Norman P. Jouppi,et al.  The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[11]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  Michael L. Scott,et al.  Hiding synchronization delays in a GALS processor microarchitecture , 2004, 10th International Symposium on Asynchronous Circuits and Systems, 2004. Proceedings..

[13]  José Duato,et al.  Efficient interconnects for clustered microarchitectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[14]  Michael L. Scott,et al.  Dynamic frequency and voltage control for a multiple clock domain microarchitecture , 2002, MICRO.

[15]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[16]  Andreas Moshovos,et al.  Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors , 2000, MICRO 33.

[17]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[18]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.