A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks

We propose an adaptive scalable architecture suitable for performing real-time algorithm-specific tasks. The architecture is based on the globally asynchronous and locally synchronous (GALS) design paradigm. We demonstrate that for different real-time commercial applications with algorithm-specific jobs like online transaction processing, Fourier transform etc., the proposed architecture allows dynamic load-balancing and adaptive inter-task voltage scaling. The architecture can also detect process-shifts for the individual processing units and determine their appropriate operating conditions. Simulation results for two representative applications show that for a random job distribution, we obtain up to 67% improvement in MOPS/W (millions of operations per second per watt) over a fully synchronous implementation.

[1]  Kaushik Roy,et al.  Synthesis of application-specific highly-efficient multi-mode systems for low-power applications , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[2]  Steven M. Nowick,et al.  Robust interfaces for mixed-timing systems with application to latency-insensitive protocols , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[3]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[4]  R.W. Brodersen,et al.  A dynamic voltage scaled microprocessor system , 2000, IEEE Journal of Solid-State Circuits.

[5]  Johan Pouwelse,et al.  Dynamic voltage scaling on a low-power microprocessor , 2001, MobiCom '01.

[6]  M. Potkonjak,et al.  On-line scheduling of hard real-time tasks on variable voltage processor , 1998, 1998 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (IEEE Cat. No.98CB36287).

[7]  Rami G. Melhem,et al.  Power-aware scheduling for periodic real-time tasks , 2004, IEEE Transactions on Computers.

[8]  Steven M. Nowick,et al.  A low-latency FIFO for mixed-clock systems , 2000, Proceedings IEEE Computer Society Workshop on VLSI 2000. System Design for a System-on-Chip Era.

[9]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[10]  Michael L. Scott,et al.  Hiding synchronization delays in a GALS processor microarchitecture , 2004, 10th International Symposium on Asynchronous Circuits and Systems, 2004. Proceedings..

[11]  天野 英晴 J. L. Hennessy and D. A. Patterson: Computer Architecture: A Quantitative Approach, Morgan Kaufmann (1990)(20世紀の名著名論) , 2003 .

[12]  Y. A. Eken,et al.  A 5.9-GHz Voltage-Controlled Ring Oscillator in 0.18- m CMOS , 2004 .

[13]  Scott Hauck,et al.  Asynchronous design methodologies: an overview , 1995, Proc. IEEE.

[14]  Diana Marculescu,et al.  Power and performance evaluation of globally asynchronous locally synchronous processors , 2002, ISCA.

[15]  Michael L. Scott,et al.  Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor , 2003, ISCA '03.

[16]  J.N. Seizovic,et al.  Pipeline synchronization , 1994, Proceedings of 1994 IEEE Symposium on Advanced Research in Asynchronous Circuits and Systems.

[17]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[18]  H. Luong,et al.  A 2-V 1.8-GHz fully-integrated CMOS dual-loop frequency synthesizer , 2000, 2000 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.00CH37103).

[19]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[20]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[21]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[22]  M. Hussein,et al.  A 130 nm generation logic technology featuring 70 nm transistors, dual Vt transistors and 6 layers of Cu interconnects , 2000, International Electron Devices Meeting 2000. Technical Digest. IEDM (Cat. No.00CH37138).

[23]  Hai Li,et al.  VSV: L2-miss-driven variable supply-voltage scaling for low power , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[24]  Márta Rencz,et al.  CMOS sensors for on-line thermal monitoring of VLSI circuits , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[25]  J. Zhang,et al.  An ultra-low-power digitally-controlled buck converter IC for cellular phone applications , 2004, Nineteenth Annual IEEE Applied Power Electronics Conference and Exposition, 2004. APEC '04..

[26]  M. Jamal Deen,et al.  Performance characteristics of an ultra-low power VCO , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[27]  Lars Wanhammar DSP integrated circuits , 1999 .

[28]  Peter Robinson,et al.  Self calibrating clocks for globally asynchronous locally synchronous systems , 2000, Proceedings 2000 International Conference on Computer Design.

[29]  Kaushik Roy,et al.  Low-Power CMOS VLSI Circuit Design , 2000 .

[30]  Johnny Öberg,et al.  Lowering power consumption in clock by using globally asynchronous locally synchronous design style , 1999, DAC '99.

[31]  Steven R. Kunkel,et al.  System optimization for OLTP workloads , 1999, IEEE Micro.