Processor design based on dataflow concurrency

Abstract This paper presents new architectural concepts for uniprocessor system designs. They result in a uniprocessor design that conforms to the data-driven (i.e. dataflow) computation paradigm. It is shown that usage of this, namely D 2 -CPU (Data-Driven) processor, follows the natural flow of programs, minimizes redundant (micro)operations, lowers the hardware cost, and reduces the power consumption. We assume that programs are developed naturally using a graphical or equivalent language that can explicitly show all data dependencies. Instead of giving the CPU the privileged right of deciding what instructions to fetch in each cycle (as is the case for CPUs with a program counter), instructions are entering the CPU when they are ready to execute or when all their operand(s) are to be available within a few clock cycles. This way, the application-knowledgeable algorithm, rather than the application-ignorant CPU, is in control. The CPU is used just as a resource, the way it should normally be. This approach results in outstanding performance and elimination of large numbers of redundant operations that plague current processor designs. The latter, conventional CPUs are characterized by numerous redundant operations, such as the first memory cycle in instruction fetching that is part of any instruction cycle, and instruction and data prefetchings for instructions that are not always needed. A comparative analysis of our design with conventional designs proves that it is capable of better performance and simpler programming. Finally, VHDL implementation is used to prove the viability of this approach.

[1]  Sotirios G. Ziavras RH: A Versatile Family of Reduced Hypercube Interconnection Networks , 1994, IEEE Trans. Parallel Distributed Syst..

[2]  H. T. Kung,et al.  Wafer-scale integration and two-level pipelined implementations of systolic arrays , 1984, J. Parallel Distributed Comput..

[3]  B. Ramakrishna Rau,et al.  EPIC: Explicititly Parallel Instruction Computing , 2000, Computer.

[4]  Charles E. Leiserson,et al.  Space-efficient scheduling of multithreaded computations , 1993, SIAM J. Comput..

[5]  Yale N. Patt,et al.  HPSm, a high performance restricted data flow architecture having minimal functionality , 1986, ISCA '98.

[6]  H. Grebel,et al.  A low-complexity parallel system for gracious scalable performance. Case study for near PetaFLOPS computing , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[7]  D. Burger,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[8]  Maya Gokhale,et al.  Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[9]  Guang R. Gao,et al.  A design study of the EARTH multiprocessor , 1995, PACT.

[10]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1998, ISCA '98.

[11]  Charles E. Leiserson,et al.  Space-Efficient Scheduling of Multithreaded Computations , 1998, SIAM J. Comput..

[12]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[13]  Sotirios G. Ziavras,et al.  A Universal, Dynamically Adaptable and Programmable Network Router for Parallel Computers , 2001, VLSI Design.

[14]  Souichi Miyata,et al.  Design Philosophy of a Data-Driven Processor: Q-p , 1988 .

[15]  Josep Llosa,et al.  Widening resources: a cost-effective technique for aggressive ILP architectures , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[16]  Dhiraj K. Pradhan,et al.  Buffer Assignment Algorithms on Data Driven ASICs , 2000, IEEE Trans. Computers.

[17]  Michael J. Flynn,et al.  Computer Architecture: Pipelined and Parallel Processor Design , 1995 .

[18]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.

[19]  Guang R. Gao,et al.  Automatically Partitioning Threads for Multithreaded Architectures , 1999, J. Parallel Distributed Comput..

[20]  Sotirios G. Ziavras,et al.  Dataflow computation with intelligent memories emulated on field-programmable gate arrays (FPGAs) , 2002, Microprocess. Microsystems.

[21]  Paul Chow,et al.  Memory interfacing and instruction specification for reconfigurable processors , 1999, FPGA '99.

[22]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[23]  John L. Hennessy,et al.  The Future of Systems Research , 1999, Computer.