Stream computations organized for reconfigurable execution

Abstract Reconfigurable systems can offer the high spatial parallelism and fine-grained, bit-level resource control traditionally associated with hardware implementations, along with the flexibility and adaptability characteristic of software. While reconfigurable systems create new opportunities for engineering and delivering high-performance programmable systems, the traditional approaches to programming and managing computations used for hardware systems (e.g., Verilog, VHDL) and software systems (e.g., C, Fortran, Java) are inappropriate and inadequate for exploiting reconfigurable platforms. To address this need, we develop a stream-oriented compute model, system architecture, and execution patterns which can capture and exploit the parallelism of spatial computations while simultaneously abstracting software applications from hardware details (e.g., timing, device capacity, and microarchitectural implementation details) and consequently allowing applications to scale to exploit newer, larger, and faster hardware platforms. Further, we describe hardware and software techniques that make this late-bound platform mapping viable and efficient.

[1]  Russell Tessier,et al.  An architecture and compiler for scalable on-chip communication , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  F. Leighton New lower bound techniques for VLSI , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).

[3]  Rudy Lauwereins,et al.  Run-time support for heterogeneous multitasking on reconfigurable SoCs , 2004, Integr..

[4]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[5]  Edward A. Lee,et al.  A framework for comparing models of computation , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[6]  Dana S. Henry,et al.  A tightly-coupled processor-network interface , 1992, ASPLOS V.

[7]  Erwin A. de Kock,et al.  YAPI: application modeling for signal processing systems , 2000, Proceedings 37th Design Automation Conference.

[8]  John Wawrzynek,et al.  The SFRA: a corner-turn FPGA architecture , 2004, FPGA '04.

[9]  Bohn Stafleu van Loghum,et al.  Online … , 2002, LOG IN.

[10]  C. A. R. Hoare,et al.  Communicating Sequential Processes (Reprint) , 1983, Commun. ACM.

[11]  George Varghese,et al.  HSRA: high-speed, hierarchical synchronous reconfigurable array , 1999, FPGA '99.

[12]  Scott Hauck,et al.  The Chimaera reconfigurable functional unit , 1997, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  John D. Villasenor,et al.  Issues in wireless video coding using run-time-reconfigurable FPGAs , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[14]  Sharad Malik,et al.  Accelerating Boolean satisfiability with configurable hardware , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[15]  William J. Dally,et al.  The message-driven processor: a multicomputer processing node with efficient mechanisms , 1992, IEEE Micro.

[16]  Stephen Dean Brown,et al.  The case for registered routing switches in field programmable gate arrays , 2001, FPGA '01.

[17]  S. Perissakis,et al.  Embedded DRAM for a reconfigurable array , 1999, 1999 Symposium on VLSI Circuits. Digest of Papers (IEEE Cat. No.99CH36326).

[18]  ScienceDirect Microprocessors and microsystems , 1978 .

[19]  André DeHon,et al.  Hardware-assisted simulated annealing with application for fast FPGA placement , 2003, FPGA '03.

[20]  Stylianos Perissakis,et al.  Balancing computation and memory in high capacity reconfigurable arrays , 2000 .

[21]  John Wawrzynek,et al.  The Garp Architecture and C Compiler , 2000, Computer.

[22]  John Wawrzynek,et al.  Hardware-assisted fast routing , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[23]  Edward A. Lee,et al.  Scheduling dynamic dataflow graphs with bounded memory using the token flow model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Michael M. Chu,et al.  Dynamic Runtime Scheduler Support for SCORE , 2002 .

[25]  Peter M. Athanas,et al.  A run-time reconfigurable engine for image interpolation , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[26]  Jan M. Rabaey,et al.  A reconfigurable multiprocessor IC for rapid prototyping of algorithmic-specific high-speed DSP data paths , 1992 .

[27]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[28]  Brad L. Hutchings,et al.  Sequencing Run-Time Reconfigured Hardware with Software , 1996, Fourth International ACM Symposium on Field-Programmable Gate Arrays.

[29]  John Wawrzynek,et al.  Stochastic spatial routing for reconfigurable networks , 2006, Microprocess. Microsystems.

[30]  Frederick P. Brooks,et al.  Architecture of the IBM System/360 , 2000, IBM J. Res. Dev..

[31]  André DeHon,et al.  Reconfigurable architectures for general-purpose computing , 1996 .

[32]  Wayne Luk,et al.  Compilation tools for run-time reconfigurable designs , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[33]  Mary Shaw,et al.  Software architecture - perspectives on an emerging discipline , 1996 .

[34]  V. Michael Bove,et al.  Media processing with field-programmable gate arrays on a microprocessor's local bus , 1998, Electronic Imaging.

[35]  John Wawrzynek,et al.  Stream Computations Organized for Reconfigurable Execution (SCORE): Introduction and Tutorial , 2000 .

[36]  Gordon J. Brebner,et al.  The swappable logic unit: a paradigm for virtual hardware , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[37]  Jean A. Peperstraete,et al.  Cycle-static dataflow , 1996, IEEE Trans. Signal Process..

[38]  Nachiket Kapre,et al.  Packet Switched vs. Time Multiplexed FPGA Overlay Networks , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[39]  Kurt Keutzer,et al.  Rethinking Deep-Submicron Circuit Design , 1999, Computer.

[40]  John Wawrzynek,et al.  Analysis of quasi-static scheduling techniques in a virtualized reconfigurable machine , 2002, FPGA '02.

[41]  Carl Ebeling,et al.  PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs , 1995, Third International ACM Symposium on Field-Programmable Gate Arrays.

[42]  Yury Markovskiy Quasi-Static Scheduling for SCORE , 2004 .

[43]  Majid Sarrafzadeh,et al.  Fast Template Placement for Reconfigurable Computing Systems , 2000, IEEE Des. Test Comput..

[44]  John Wawrzynek,et al.  Stochastic, spatial routing for hypergraphs, trees, and meshes , 2003, FPGA '03.

[45]  Brad Hutchings,et al.  Density enhancement of a neural network using FPGAs and run-time reconfiguration , 1994, Proceedings of IEEE Workshop on FPGA's for Custom Computing Machines.

[46]  Raphael Rubin,et al.  Design of FPGA interconnect for multilevel metallization , 2003, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[47]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[48]  Carl Ebeling,et al.  RaPiD - Reconfigurable Pipelined Datapath , 1996, FPL.

[49]  Zhiyuan Li,et al.  Configuration Caching Techniques for FPGA , 2000 .

[50]  J. P. Ed,et al.  Transmission control protocol- darpa internet program protocol specification , 1981 .

[51]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[52]  John Wawrzynek,et al.  Design automation for streaming systems , 2005 .

[53]  Nachiket Kapre,et al.  Design patterns for reconfigurable computing , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[54]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[55]  André DeHon,et al.  MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[56]  Edward A. Lee,et al.  Concurrent models of computation for embedded software , 2005 .

[57]  Thomas Martyn Parks,et al.  Bounded scheduling of process networks , 1996 .

[58]  John Wawrzynek,et al.  Instruction-Level Parallelism for Reconfigurable Computing , 1998, FPL.

[59]  Michael D. Smith,et al.  A high-performance microarchitecture with hardware-programmable functional units , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[60]  R. Jagannathan,et al.  Multidimensional programming , 1995 .

[61]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[62]  I. Xilinx,et al.  Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete data sheet , 2004 .

[63]  Gilles Kahn,et al.  Coroutines and Networks of Parallel Processes , 1977, IFIP Congress.

[64]  André DeHon,et al.  The Density Advantage of Configurable Computing , 2000, Computer.

[65]  J. M. Rabaey,et al.  A 2.4 GOPS data-driven reconfigurable multiprocessor IC for DSP , 1995, Proceedings ISSCC '95 - International Solid-State Circuits Conference.

[66]  Seth Copen Goldstein,et al.  PipeRench: a co/processor for streaming multimedia acceleration , 1999, ISCA.

[67]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.

[68]  Raphael Rubin,et al.  Design of FPGA interconnect for multilevel metallization , 2004, IEEE Trans. Very Large Scale Integr. Syst..

[69]  Jan M. Rabaey,et al.  DSP specification using the Silage language , 1990 .

[70]  John Wawrzynek,et al.  Stream Computations Organized for Reconfigurable Execution (SCORE) Extended Abstract , 2000 .

[71]  Edward A. Lee,et al.  Software Synthesis from Dataflow Graphs , 1996 .

[72]  Frank Thomson Leighton Introduction to parallel algorithms and architectures: arrays , 1992 .

[73]  Henry Hoffmann,et al.  A stream compiler for communication-exposed architectures , 2002, ASPLOS X.

[74]  Frank Thomson Leighton,et al.  New lower bound techniques for VLSI , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).