Fast placement and routing by extending coarse-grained reconfigurable arrays with Omega Networks

Abstract Reconfigurable computing architectures are commonly used for accelerating applications and/or for achieving energy savings. However, most reconfigurable computing architectures suffer from computationally demanding placement and routing (P&R) steps. This problem may disable their use in systems requiring dynamic compilation (e.g., to guarantee application portability in embedded systems). Bearing in mind the simplification of P&R steps, this paper presents and analyzes a coarse-grained reconfigurable array (CGRA) extended with global multistage interconnect networks, specifically Omega Networks. We show that integrating one or two Omega Networks in a CGRA permits to simplify the P&R stage resulting in both low hardware resource overhead and low performance degradation (18% for an 8 × 8 array). We compare the proposed CGRA, which integrates one or two Omega Networks, with a CGRA based on a grid of processing elements with reach neighbor interconnections and with a torus topology. The execution time needed to perform the P&R stage for the two array architectures shows that the array using two Omega Networks needs a far simpler and faster P&R. The P&R stage in our approach completed on average in about 16× less time for the 17 benchmarks used. Similar fast approaches needed CGRAs with more complex interconnect resources in order to allow most of the benchmarks used to be successfully placed and routed.

[1]  Tse-Yun Feng,et al.  On a Class of Rearrangeable Networks , 1992, IEEE Trans. Computers.

[2]  Abdel Ejnioui,et al.  Multi-terminal net routing for partial crossbar-based multi-FPGA systems , 1999, FPGA '99.

[3]  M. Abid,et al.  Multistage Interconnection Network for MPSoC: Performances study and prototyping on FPGA , 2008, 2008 3rd International Design and Test Workshop.

[4]  Abraham Waksman,et al.  A Permutation Network , 1968, JACM.

[5]  João M. P. Cardoso,et al.  A Polynomial Placement Algorithm for Data Driven Coarse-Grained Reconfigurable Architectures , 2007, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07).

[6]  Kyungsook Y. Lee,et al.  A New Benes Network Control Algorithm , 1987, IEEE Trans. Computers.

[7]  Scott Hauck,et al.  The roles of FPGAs in reprogrammable systems , 1998, Proc. IEEE.

[8]  Maya Gokhale,et al.  Reconfigurable Computing: Accelerating Computation with Field-Programmable Gate Arrays , 2005 .

[9]  Wayne Luk,et al.  Reconfigurable computing: architectures and design methods , 2005 .

[10]  Walid A. Najjar,et al.  Compiler generated systolic arrays for wavefront algorithm acceleration on FPGAs , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[11]  Scott Hauck,et al.  Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation , 2007 .

[12]  Carl Ebeling,et al.  RaPiD - Reconfigurable Pipelined Datapath , 1996, FPL.

[13]  John Wawrzynek,et al.  Hardware-assisted fast routing , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[14]  A. Mullin,et al.  Mathematical Theory of Connecting Networks and Telephone Traffic. , 1966 .

[15]  Marrakchi Zied,et al.  Efficient tree topology for FPGA interconnect network , 2008, GLSVLSI '08.

[16]  Rudy Lauwereins,et al.  Architecture exploration for a reconfigurable architecture template , 2005, IEEE Design & Test of Computers.

[17]  Hasan Çam,et al.  Work-Efficient Routing Algorithms for Rearrangeable Symmetrical Networks , 1999, IEEE Trans. Parallel Distributed Syst..

[18]  Frank Vahid,et al.  Warp Processing: Dynamic Translation of Binaries to FPGA Circuits , 2008, Computer.

[19]  João M. P. Cardoso,et al.  On estimations for compiling software to FPGA-based systems , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[20]  Markus Weinhardt,et al.  PACT XPP—A Self-Reconfigurable Data Processing Architecture , 2004, The Journal of Supercomputing.

[21]  Edusmildo Orozco,et al.  Reconfigurable Computing. Accelerating Computation with Field-Programmable Gate Arrays , 2007, Scalable Comput. Pract. Exp..

[22]  Rajesh Gupta,et al.  Network topology exploration of mesh-based coarse-grain reconfigurable architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[23]  Alex K. Jones,et al.  Interconnect customization for a hardware fabric , 2009, TODE.

[24]  Howard Jay Siegel,et al.  Interconnection networks for large-scale parallel processing: theory and case studies (2nd ed.) , 1985 .

[25]  Carl Ebeling,et al.  PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs , 1995, Third International ACM Symposium on Field-Programmable Gate Arrays.

[26]  João M. P. Cardoso,et al.  On Simplifying Placement and Routing by Extending Coarse-Grained Reconfigurable Arrays with Omega Networks , 2009, ARC.

[27]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[28]  Russell Tessier,et al.  Fast place and route approaches for fpgas , 1999 .

[29]  Jürgen Teich,et al.  Parallelization Approaches for Hardware Accelerators - Loop Unrolling Versus Loop Partitioning , 2009, ARCS.

[30]  Neil W. Bergmann,et al.  QUKU: A FPGA Based Flexible Coarse Grain Architecture Design Paradigm using Process Networks , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[31]  Jürgen Becker,et al.  High‐performance computing using a reconfigurable accelerator , 1996 .

[32]  Christian Haubelt,et al.  SystemCoDesigner—an automatic ESL synthesis approach by design space exploration and behavioral synthesis for streaming applications , 2009, TODE.

[33]  Duncan H. Lawrie,et al.  Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[34]  Miroslaw Malek,et al.  On the Number of Permutations Performable by Extra-Stage Multistage Interconnection Networks , 1989, IEEE Trans. Computers.

[35]  Seth Copen Goldstein,et al.  PipeRench: A Reconfigurable Architecture and Compiler , 2000, Computer.

[36]  Weifa Liang,et al.  Optimally Routing LC Permutations on k-Extra-Stage Cube-Type Networks , 1996, IEEE Trans. Computers.

[37]  Alex K. Jones,et al.  An FPGA-based VLIW processor with custom hardware execution , 2005, FPGA '05.

[38]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[39]  Klaus D. Müller-Glaser,et al.  MORPHEUS: Heterogeneous Reconfigurable Computing , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[40]  S. Andresen The Looping Algorithm Extended to Base 2tRearrangeable Switching Networks , 1977, IEEE Trans. Commun..

[41]  Georgi Gaydadjiev,et al.  Architectural Exploration of the ADRES Coarse-Grained Reconfigurable Array , 2007, ARC.

[42]  Diederik Verkest,et al.  Interconnect Power Analysis for a Coarse-Grained Reconfigurable Array Processor , 2008, PATMOS.

[43]  William J. Dally,et al.  Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.

[44]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[45]  Maya Gokhale,et al.  A Polymorphous Computing Fabric , 2002, IEEE Micro.

[46]  Carl Ebeling,et al.  SPR: an architecture-adaptive CGRA mapping tool , 2009, FPGA '09.

[47]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.

[48]  TingTing Hwang,et al.  Net assignment for the FPGA-based logic emulation system in the folded-Clos network structure , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..