A CAD framework for Malibu: an FPGA with time-multiplexed coarse-grained elements

Modern FPGAs are used to implement a wide range of circuits, many of which have coarse-grained and fine-grained components. The ever-increasing size of these circuits places great demand on CAD tools to synthesize circuits faster and without loss in quality. Synthesizing coarse-grained components onto fine-grained FPGA resources is inefficient, and past attempts to optimize FPGAs for word-oriented datapaths have met with limited success. This paper presents a CAD flow to fully compile Verilog into a configuration bitstream for a new type of FPGA with time-multiplexed coarse-grained resources. We demonstrate two approaches with gains of 61x and 42x in synthesis time on average compared to QuartusII, but due to time-multiplexing and current synthesis limitations we achieve circuit speeds of 14x and 8.5x slower on average. We show the tools can also trade density for maximum clock frequency.

[1]  Kenneth B. Kent,et al.  Odin II - An Open-Source Verilog HDL Synthesis Tool for CAD Research , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[2]  Don Cherepacha,et al.  DP-FPGA: An FPGA Architecture Optimized for Datapaths , 1996, VLSI Design.

[3]  Dhiraj K. Pradhan,et al.  A 2-port 6T SRAM bitcell design with multi-port capabilities at reduced area overhead , 2010, 2010 11th International Symposium on Quality Electronic Design (ISQED).

[4]  Steven Trimberger,et al.  A time-multiplexed FPGA , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[5]  John Wawrzynek,et al.  Stream Computations Organized for Reconfigurable Execution (SCORE) , 2000, FPL.

[6]  Seth Copen Goldstein,et al.  PipeRench: a co/processor for streaming multimedia acceleration , 1999, ISCA.

[7]  Rudy Lauwereins,et al.  DRESC: a retargetable compiler for coarse-grained reconfigurable architectures , 2002, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings..

[8]  Kiyoung Choi,et al.  Compilation approach for coarse-grained reconfigurable architectures , 2003, IEEE Design & Test of Computers.

[9]  Scott A. Mahlke,et al.  Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures , 2006, CASES '06.

[10]  Nii Koji,et al.  A 65 nm Ultra-High-Density Dual-port SRAM with 0.71um2 8T-cell for SoC , 2006 .

[11]  Stefan K. Lai,et al.  Flash memories: Successes and challenges , 2008, IBM J. Res. Dev..

[12]  Jonathan Rose,et al.  Using bus-based connections to improve field-programmable gate-array density for implementing datapath circuits , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  André DeHon,et al.  Reconfigurable architectures for general-purpose computing , 1996 .

[14]  Hyunchul Park,et al.  Polymorphic Pipeline Array: A Flexible Multicore Accelerator for Mobile Multimedia Applications , 2009 .

[15]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[16]  Saman P. Amarasinghe,et al.  Convergent scheduling , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[17]  Robert K. Brayton,et al.  ABC: An Academic Industrial-Strength Verification Tool , 2010, CAV.

[18]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[19]  K. Ishibashi,et al.  A 65 nm Ultra-High-Density Dual-Port SRAM with 0.71um/sup ~/ 8T-Cell for SoC , 2006, 2006 Symposium on VLSI Circuits, 2006. Digest of Technical Papers..

[20]  Aviral Shrivastava,et al.  SPKM : A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures , 2008, 2008 Asia and South Pacific Design Automation Conference.

[21]  Carl Ebeling,et al.  RaPiD - Reconfigurable Pipelined Datapath , 1996, FPL.

[22]  D. Jones,et al.  A time-multiplexed FPGA architecture for logic emulation , 1995, Proceedings of the IEEE 1995 Custom Integrated Circuits Conference.

[23]  Minkyu Song,et al.  Design of a high performance 32/spl times/32-bit multiplier with a novel sign select Booth encoder , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[24]  Carl Ebeling,et al.  SPR: an architecture-adaptive CGRA mapping tool , 2009, FPGA '09.

[25]  Vaughn Betz,et al.  Timing-driven placement for FPGAs , 2000, FPGA '00.

[26]  Mark-Eric Jones 1 T-SRAM-Q TM : Quad-Density Technology Reins in Spiraling Memory Requirements , 2002 .

[27]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: applications in VLSI domain , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[28]  Guy Lemieux,et al.  Rapid Synthesis and Simulation of Computational Circuits in an MPPA , 2009, 2009 International Conference on Field-Programmable Technology.

[29]  Seth Copen Goldstein,et al.  Virtualization on the Tartan Reconfigurable Architecture , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[30]  Scott A. Mahlke,et al.  Edge-centric modulo scheduling for coarse-grained reconfigurable architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[31]  Miodrag Potkonjak,et al.  Optimum and heuristic transformation techniques for simultaneous optimization of latency and throughput , 1995, IEEE Trans. Very Large Scale Integr. Syst..