DTA-C: A Decoupled multi-Threaded Architecture for CMP Systems

One way to exploit Thread Level Parallelism (TLP) is to use architectures that implement novel multithreaded execution models, like Scheduled Data- Flow (SDF). This latter model promises an elegant decoupled and non-blocking execution of threads. Here we extend that model in order to be used in future scalable CMP systems where wire delay imposes to partition the design. In this paper we describe our approach and experiment with different distributed schedulers, different number of clusters and processors per cluster to show good scalability of our architecture. We describe our approach and present initial results on system scalability and performance. We suggest design choices to improve the scalability of the basic design.

[1]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[2]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[3]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[4]  Gurindar S. Sohi,et al.  Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[5]  Steven Swanson,et al.  Instruction scheduling for a tiled dataflow architecture , 2006, ASPLOS XII.

[6]  I. Matosevic,et al.  The MLCA: A Solution Paradigm for Parallel Programmable SoCs , 2006, 2006 IEEE North-East Workshop on Circuits and Systems.

[7]  J.E. Smith,et al.  The Astronautics ZS-1 processor , 1988, Proceedings 1988 IEEE International Conference on Computer Design: VLSI.

[8]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[9]  Krishna M. Kavi,et al.  Execution and Cache Performance of the Scheduled Dataflow Architecture , 2000, J. Univers. Comput. Sci..

[10]  Jean-Luc Gaudiot,et al.  Design and evaluation of a hierarchical decoupled architecture , 2006, The Journal of Supercomputing.

[11]  Krishna M. Kavi,et al.  Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation , 2001, IEEE Trans. Computers.

[12]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .