A Technology-Scalable Architecture for Fast Clocks and High ILP

CMOS technology scaling poses challenges in designing dynamically scheduled cores that can sustain both high instruction-level parallelism and aggressive clock frequencies. In this paper, we present a new architecture that maps compiler-scheduled blocks onto a two-dimensional grid of ALUs. For the mapped window of execution, instructions execute in a dataflow-like manner, with each ALU forwarding its result along short wires to the consumers of the result. We describe our studies of program behavior and a preliminary evaluation that show that this architecture has the potential for both high clock speeds and high ILP, and may offer the best of both the VLIW and dynamic superscalar architectures.

[1]  Shashank Gupta,et al.  Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks , 2001 .

[2]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[3]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[4]  B. Ramakrishna Rau,et al.  Dynamically scheduled VLIW processors , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[5]  Gurindar S. Sohi,et al.  Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors , 1992, MICRO.

[6]  Paolo Faraboschi,et al.  An analysis of dynamic scheduling techniques for symbolic applications , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[7]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[8]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.

[9]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[10]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[11]  Scott A. Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[12]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[13]  Manoj Franklin,et al.  An empirical study of decentralized ILP execution models , 1998, ASPLOS VIII.

[14]  Josep Llosa,et al.  Using Sacks to Organize Registers in VLIW Machines , 1994, CONPAR.

[15]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[16]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[17]  Gary S. Tyson,et al.  Register queues: a new hardware/software approach to efficient software pipelining , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).