The Microarchitecture of a Pipelined WaveScalar Processor : An RTL-based Study

WaveScalar is a recently introduced architecture designed to capitalize on the vast number of transistors available on modern processes. Prior work introduced the architecture and used simulation-based results to demonstrate its performance-efficiency compared to conventional designs. But can it really be built in commercially viable area budgets and will it achieve a clock speed comparable to more conventional superscalars? This paper answers these questions. We have designed a synthesizable RTL model of the WaveScalar microarchitecture, called the WaveCache. This includes its execution substrate, memory system, and interconnect. Using the TSMC 90nm process and latest design tools, this model synthesizes to a chip of 252mm and achieves a clock rate of 25 FO4. This paper describes its RTL implementation and couples it with results from cycle-level simulation to illustrate the key performance-area-delay trade-offs in WaveCache design.

[1]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.

[2]  A. L. Davis,et al.  The architecture and system method of DDM1: A recursively structured Data Driven Machine , 1978, ISCA '78.

[3]  Stephen J. Allan,et al.  A Flow Analysis Procedure for the Translation of High-Level Languages to a Data Flow Language , 1980, IEEE Transactions on Computers.

[4]  Hiroshi Yasuhara,et al.  DDDP-a Distributed Data Driven Processor , 1983, ISCA '83.

[5]  Ian Watson,et al.  The Manchester prototype dataflow computer , 1985, CACM.

[6]  A. H. Veen,et al.  The misconstrued semicolon: reconciling imperative languages and dataflow machines , 1986 .

[7]  Kenji Nishida,et al.  Evaluation of a Prototype Data Flow Processor of the SIGMA-1 for Scientific Computations , 1986, ISCA.

[8]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[9]  Michel Dubois,et al.  The design of a lockup-free cache for high-performance multiprocessors , 1988, Proceedings. SUPERCOMPUTING '88.

[10]  Keshav Pingali,et al.  I-structures: data structures for parallel computing , 1986, Graph Reduction.

[11]  V. G. Grafe,et al.  The Epsilon dataflow processor , 1989, ISCA '89.

[12]  John L. Hennessy,et al.  Big science versus little science—do you have to build it? (panel session) , 1990, ISCA '90.

[13]  H. T. Kung,et al.  Supporting systolic and memory communication in iWarp , 1990, ISCA '90.

[14]  Cecil O. Alford,et al.  GT-EP: a novel high-performance real-time architecture , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[15]  Arvind,et al.  M-Structures: Extending a Parallel, Non-strict, Functional Language with State , 1991, FPCA.

[16]  H. Kadota,et al.  OHMEGA : a VLSI superscalar processor architecture for numerical applications , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[17]  Tetsuya Higuchi,et al.  IXM2: a parallel associative processor , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[18]  Benoît Dupont de Dinechin StaCS: a Static Control Superscalar architecture , 1992, MICRO.

[19]  Michael D. Noakes,et al.  The J-machine multicomputer: an architectural evaluation , 1993, ISCA '93.

[20]  Mitsuhisa Sato,et al.  The EM-X parallel computer: architecture and basic performance , 1995, ISCA.

[21]  Mark Horowitz,et al.  An evaluation of directory schemes for cache coherence , 1998, ISCA '98.

[22]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1998, ISCA '98.

[23]  Donald Yeung,et al.  The MIT Alewife machine: architecture and performance , 1995, ISCA '98.

[24]  Vivek Sarkar,et al.  Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.

[25]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[26]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[27]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[28]  David Chinnery,et al.  Closing the gap between ASIC & custom , 2002 .

[29]  David Blaauw,et al.  Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation , 2003, MICRO.

[30]  Jürgen Becker,et al.  Architecture, memory and interface technology integration of an industrial/ academic configurable system-on-chip (CSoC) , 2003, IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings..

[31]  William J. Dally,et al.  Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[32]  Christopher Batten,et al.  The vector-thread architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[33]  Henry Hoffmann,et al.  Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..