Chip multiprocessor based on data-driven multithreading model

Although the dataflow model of execution, with its obvious benefits, has been proposed for a long time, it has not yet been successfully exploited. Nevertheless, as traditional systems have recently started to reach their limits in delivering higher performance, new models of execution that use dataflow-like concepts are being studied. Among these, Data-Driven Multithreading (DDM) is a multithreading model that effectively hides the communication delay and synchronisation overheads. In DDM threads are scheduled as soon as their input data has been produced, that is, in a dataflow-like way. In addition to presenting a motivation to the dataflow model of execution, this paper also presents an overview of the DDM project. In particular, it focuses on the Chip Multiprocessor (CMP) implementation using the DDM model, its hardware, run-time system and performance evaluation. The DDM-CMP inherits the benefits of both the DDM model which allows to overcome the memory wall limitation and the CMP which offers a simpler design, higher degree of parallelism and larger power-performance efficiency, therefore overcoming the power wall. Preliminary experimental results show a significant benefit in terms of both speedup and power consumption, making the DDM-CMP architecture an attractive architecture for future processors.

[1]  David E. Culler,et al.  Two Fundamental Limits on Dataflow Multiprocessing , 1993, Architectures and Compilation Techniques for Fine and Medium Grain Parallelism.

[2]  David A. Patterson,et al.  RAMP: research accelerator for multiple processors - a community vision for a shared experimental parallel HW/SW platform , 2006, ISPASS.

[3]  Paraskevas Evripidou,et al.  Fitting more Data-Driven Multithreading Cores into the Chip , 2005 .

[4]  Gurindar S. Sohi,et al.  Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[5]  Paraskevas Evripidou D3-Machine: A decoupled data-driven multithreaded architecture with variable resolution support , 2001, Parallel Comput..

[6]  Paraskevas Evripidou,et al.  Communication Assist for Data Driven Multithreading , 2001, Panhellenic Conference on Informatics.

[7]  Guang R. Gao,et al.  A design study of the EARTH multiprocessor , 1995, PACT.

[8]  Paraskevas Evripidou,et al.  A Decoupled Graph/Computation Data-Driven Architecture with Variable-Resolution Actors , 1990, International Conference on Parallel Processing.

[9]  Steven Swanson,et al.  Area-Performance Trade-offs in Tiled Dataflow Architectures , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[10]  Paraskevas Evripidou Thread Synchronization Unit (TSU): A Building Block for High Performance Computers , 1997, ISHPC.

[11]  Krishna M. Kavi,et al.  Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation , 2001, IEEE Trans. Computers.

[12]  Jack B. Dennis,et al.  Data Flow Supercomputers , 1980, Computer.

[13]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[14]  Lizy Kurian John,et al.  Scaling to the end of silicon with EDGE architectures , 2004, Computer.

[15]  Jack B. Dennis,et al.  First version of a data flow procedure language , 1974, Symposium on Programming.

[16]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[17]  Seth Copen Goldstein,et al.  TAM - A Compiler Controlled Threaded Abstract Machine , 1993, J. Parallel Distributed Comput..

[18]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[19]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[20]  Youngsoo Kim,et al.  Designing real-time H.264 decoders with dataflow architectures , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[21]  Arvind,et al.  Two Fundamental Issues in Multiprocessing: The Dataflow Solution. , 1983 .

[22]  Paraskevas Evripidou,et al.  Data Driven Network of Workstations D2NOW) , 2000, J. Univers. Comput. Sci..

[23]  Paraskevas Evripidou,et al.  A Case for Chip Multiprocessors Based on the Data-Driven Multithreading Model , 2006, International Journal of Parallel Programming.

[24]  Paraskevas Evripidou,et al.  Programming and Execution for the DDM-CMP System , 2006 .

[25]  J.M. Arul,et al.  Scalability of scheduled data flow architecture (SDF) with register contexts , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..

[26]  Kenneth R. Traub,et al.  Multithreading: a revisionist view of dataflow architectures , 1991, ISCA '91.

[27]  Barbara M. Chapman,et al.  Implementing openMP using dataflow execution model for data locality and efficient parallel execution , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[28]  Theo Ungerer,et al.  Asynchrony in Parallel Computing: From Dataflow to Multithreading , 2001, Scalable Comput. Pract. Exp..

[29]  Paraskevas Evripidou,et al.  DDM-CMP: Data-Driven Multithreading on a Chip Multiprocessor , 2005, SAMOS.

[30]  D. Marr,et al.  Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .

[31]  Jean-Luc Gaudiot,et al.  Area and system clock effects on SMT/CMP processors , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[32]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[33]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.

[34]  Robert H. Halstead,et al.  Multithreaded Computer Architecture , 1994, The Kluwer International Series in Engineering and Computer Science.

[35]  Paraskevas Evripidou,et al.  Data-Driven Multithreading Using Conventional Microprocessors , 2006, IEEE Transactions on Parallel and Distributed Systems.

[36]  Paraskevas Evripidou,et al.  CacheFlow: A Short-Term Optimal Cache Management Policy for Data Driven Multithreading , 2004, Euro-Par.

[37]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[38]  Ian Watson,et al.  The Manchester prototype dataflow computer , 1985, CACM.

[39]  Arvind,et al.  The U-Interpreter , 1982, Computer.

[40]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[41]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .