Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles

This paper investigates implementation techniques for tile-based chip multiprocessors with Globally Asynchronous Locally Synchronous (GALS) clocking styles. These architectures can simplify the physical design flow since they allow focusing on a single processor when designing an entire chip. However, they also introduce challenges to maintain system robustness and scalability. We propose a physical design flow for these architectures, investigate timing issues for robust implementations, and propose methods to take full advantage of their potential scalability. As a design example, we present data from a recently implemented single-chip 6 x 6 tile-based GALS processing array.

[1]  Tomoaki Sato,et al.  Scaling up of wave pipelines , 2001, VLSI Design 2001. Fourteenth International Conference on VLSI Design.

[2]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[3]  S. Asano,et al.  The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[4]  T. Mohsenin,et al.  An asynchronous array of simple processors for dsp applications , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[5]  Pasquale Cocchini Concurrent flip-flop and repeater insertion for high performance integrated circuits , 2002, ICCAD 2002.

[6]  Mark R. Greenstreet,et al.  Asynchronous IC interconnect network design and implementation using a standard ASIC flow , 2005, 2005 International Conference on Computer Design.

[7]  Ryan W. Apperson,et al.  A DUAL-CLOCK FIFO FOR THE RELIABLE TRANSFER OF HIGH-THROUGHPUT DATA BETWEEN UNRELATED CLOCK DOMAINS , 2004 .

[8]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[9]  N. Ranganathan,et al.  A wire-delay scalable microprocessor architecture for high performance systems , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[10]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[11]  V. Strumpen,et al.  A 16-issue multiple-program-counter microprocessor with point-to-point scalar operand network , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..