Implementation of a threaded dataflow multiprocessor using FPGAs

This paper presents the FPGA implementation and evaluation of the prototype for a Data-Driven Multithreading Chip-Multiprocessor. In particular, we study the implementation of a Thread Synchronization Unit (TSU) on FPGA, a hardware unit that enables thread execution using dataflow rules on a chip multiprocessor. Threads are scheduled for execution based on data availability, i.e. a thread is fired only if its input data is available. This model of execution is called the non-blocking Data-Driven Multithreading (DDM) model of execution. Due to its dataflow characteristics, this model exploits parallelism and tolerates latency. The DDM model has been evaluated using an execution driven simulator and showed and average speedup of 26 on a 32-node system. For evaluation purposes, implementation on Xilinx Virtex-5 FPGA using the Microblaze processors as execution cores has been performed. Experimental results show that the TSU can be implemented with a moderate hardware budget, and that delays incurred by the operation of the TSU can be tolerated. Furthermore, hardware complexity evaluation shows that the TSU size scales very well with the number of processors in the MPSoC.

[1]  Chris R. Jesshope,et al.  A general model of concurrency and its implementation as many-core dynamic RISC processors , 2008, 2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation.

[2]  Arvind V. Kathail A multiple processor data flow machine that supports generalized procedures , 1981, ISCA '81.

[3]  Arvind,et al.  Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.

[4]  Jack B. Dennis,et al.  First version of a data flow procedure language , 1974, Symposium on Programming.

[5]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[6]  Paraskevas Evripidou,et al.  Chip multiprocessor based on data-driven multithreading model , 2007, Int. J. High Perform. Syst. Archit..

[7]  Krishna M. Kavi,et al.  Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation , 2001, IEEE Trans. Computers.

[8]  SankaralingamKarthikeyan,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003 .

[9]  Paraskevas Evripidou,et al.  Data-Driven Multithreading Using Conventional Microprocessors , 2006, IEEE Transactions on Parallel and Distributed Systems.

[10]  Paraskevas Evripidou,et al.  Programming multi-core architectures using Data-Flow techniques , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[11]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[12]  Lizy Kurian John,et al.  Scaling to the end of silicon with EDGE architectures , 2004, Computer.

[13]  Steven Swanson,et al.  Area-Performance Trade-offs in Tiled Dataflow Architectures , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[14]  Paraskevas Evripidou,et al.  A Case for Chip Multiprocessors Based on the Data-Driven Multithreading Model , 2006, International Journal of Parallel Programming.

[15]  J.M. Arul,et al.  Scalability of scheduled data flow architecture (SDF) with register contexts , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..