Improving the Area Efficiency of Heterogeneous FPGAs with Shadow Clusters

Field Programmable Gate Arrays (FPGAs) serve the microchip market for designs that need to be created quickly, in small volume, or that need to be updated in the field. FPGAs have not taken over the market for large capacity, high-volume ApplicationSpecific Integrated Circuits (ASICs) since the FPGA cost is too high. This cost is mainly due to the large area gap between FPGAs and ASICs. One approach to improve the area efficiency of FPGAs is with the inclusion of hard “specific” circuits on the FPGA fabric. These circuits can implement functionality in designs in less silicon area, at a faster speed, and with less power consumption compared to implementing the same functionality in the programmable elements of an FPGA. Common examples include hard multipliers and hard memories. The fundamental question in FPGA architecture research is determining which hard circuits to include on an FPGA. Every included hard circuit needs to be used and provide a benefit to the range of designs mapped to FPGAs.

[1]  Wayne Luk,et al.  Virtual Embedded Blocks: A Methodology for Evaluating Embedded Elements in FPGAs , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[2]  Martin D. F. Wong,et al.  Edge-map: Optimal Performance Driven Technology Mapping for Iterative Lut Based Fpga Designs , 1994, IEEE/ACM International Conference on Computer-Aided Design.

[3]  Vinita Singhal,et al.  High-Speed Buffered Crossbar Switch Design Using Virtex-EM Devices , 2000 .

[4]  Jason Cong,et al.  DAG-Map: graph-based FPGA technology mapping for delay optimization , 1992, IEEE Design & Test of Computers.

[5]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2007, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[6]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[7]  Steve Golson One-hot state machine design for FPGAs , 1993 .

[8]  Vaughn Betz,et al.  Using cluster-based logic blocks and timing-driven packing to improve FPGA speed and density , 1999, FPGA '99.

[9]  Gary A. Kildall,et al.  A unified approach to global program optimization , 1973, POPL.

[10]  Steven J. E. Wilton,et al.  Memory/logic interconnect flexibility in FPGAs with large embedded memory arrays , 1996, Proceedings of Custom Integrated Circuits Conference.

[11]  Konrad Doll,et al.  Analytical placement: a linear or a quadratic objective function? , 1991, 28th ACM/IEEE Design Automation Conference.

[12]  Paul Chow,et al.  Reconfigurable molecular dynamics simulator , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[13]  R. PaulGigliotti Implementing Barrel Shifters Using Multipliers , .

[14]  Jason Cong,et al.  DAOmap: a depth-optimal area optimization mapping algorithm for FPGA designs , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[15]  Alberto L. Sangiovanni-Vincentelli,et al.  DELIGHT.SPICE: an optimization-based system for the design of integrated circuits , 1988, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[16]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[17]  Jonathan Rose,et al.  A Verilog RTL synthesis tool for heterogeneous FPGAs , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[18]  Jonathan Rose,et al.  The effect of logic block complexity on area of programmable gate arrays , 1989, 1989 Proceedings of the IEEE Custom Integrated Circuits Conference.

[19]  Karl S. Hemmert,et al.  Embedded floating-point units in FPGAs , 2006, FPGA '06.

[20]  Vaughn Betz,et al.  A fast routability-driven router for FPGAs , 1998, FPGA '98.

[21]  Rajendran Panda,et al.  Signal integrity management in an SoC physical design flow , 2003, ISPD '03.

[22]  Guang R. Gao,et al.  A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[23]  Steven J. E. Wilton,et al.  Implementing logic in FPGA memory arrays: heterogeneous memory architectures , 2002, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings..

[24]  Gordon J. Brebner,et al.  Networking on chip with platform FPGAs , 2003, Proceedings. 2003 IEEE International Conference on Field-Programmable Technology (FPT) (IEEE Cat. No.03EX798).

[25]  Peter Y. K. Cheung,et al.  Using DSP blocks for ROM replacement: a novel synthesis flow , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[26]  Alberto Sangiovanni-Vincentelli,et al.  SPICE: An optimization-based system for the design of integrated circuits , 1988, ICCAD 1988.

[27]  W. James MacLean,et al.  Video-rate stereo depth measurement on programmable hardware , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  David Lewis,et al.  Using Sparse Crossbars within LUT Clusters , 2001 .

[29]  Steven J. E. Wilton,et al.  Architectures and algorithms for field-programmable gate arrays with embedded memory , 1997 .

[30]  Allen C.-H. Wu,et al.  A Performance and Routability Driven Router for FPGAs Considering Path Delays , 1995, 32nd Design Automation Conference.

[31]  Steven J. E. Wilton,et al.  Memory-to-memory connection structures in FPGAs with embedded memory arrays , 1997, FPGA '97.

[32]  Jonathan Rose,et al.  A high-speed ray tracing engine built on a field-programmable system , 2003, Proceedings. 2003 IEEE International Conference on Field-Programmable Technology (FPT) (IEEE Cat. No.03EX798).

[33]  Paul Metzgen,et al.  Multiplexer restructuring for FPGA implementation cost reduction , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[34]  Yao-Wen Chang,et al.  A new global routing algorithm for FPGAs , 1994, ICCAD '94.

[35]  RoseJonathan,et al.  The effect of LUT and cluster size on deep-submicron FPGA performance and density , 2004 .

[36]  P. Chow,et al.  The design of a SRAM-based field-programmable gate array-Part II: Circuit design and layout , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[37]  Alberto L. Sangiovanni-Vincentelli,et al.  MUSTANG: state assignment of finite state machines targeting multilevel logic implementations , 1988, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[38]  Dwight D. Hill,et al.  Optimized reconfigurable cell array architecture for high-performance field programmable gate arrays , 1993, Proceedings of IEEE Custom Integrated Circuits Conference - CICC '93.

[39]  Jayaram Bhasker,et al.  An optimizer for hardware synthesis , 1990, IEEE Design & Test of Computers.

[40]  Jonathan Rose,et al.  Advantages of heterogeneous logic block architecture for FPGAs , 1993, Proceedings of IEEE Custom Integrated Circuits Conference - CICC '93.

[41]  Vaughn Betz,et al.  The Stratix II logic and routing architecture , 2005, FPGA '05.

[42]  L. Cooke,et al.  An MPGA Compatible FPGA Architecture , 1992, 1992 Proceedings of the IEEE Custom Integrated Circuits Conference.

[43]  Lars Liebmann,et al.  Layout impact of resolution enhancement techniques: impediment or opportunity? , 2003, ISPD '03.

[44]  Jonathan Rose,et al.  Hard vs. soft: the central question of pre-fabricated silicon , 2004, Proceedings. 34th International Symposium on Multiple-Valued Logic.

[45]  J. Rose,et al.  Mapping multiplexers onto hard multipliers in FPGAs , 2005, The 3rd International IEEE-NEWCAS Conference, 2005..

[46]  Majid Sarrafzadeh,et al.  Instruction generation for hybrid reconfigurable systems , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[47]  Philip N. Strenski,et al.  Gradient-based optimization of custom circuits using a static-timing formulation , 1999, DAC '99.

[48]  Brian W. Kernighan,et al.  A Procedure for Placement of Standard-Cell VLSI Circuits , 1985, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[49]  Jonathan Rose,et al.  Chortle-crf: fast technology mapping for lookup table-based FPGAs , 1991, 28th ACM/IEEE Design Automation Conference.

[50]  Abbas El Gamal,et al.  Two-dimensional stochastic model for interconnections in master-slice integrated circuits , 1981 .

[51]  Jason Cong,et al.  Application-specific instruction generation for configurable processor architectures , 2004, FPGA '04.

[52]  Allen C.-H. Wu Yuh-Sheng Lee A Performance and Routability Driven Router for FPGAs Considering Path Delays , 1995, DAC 1995.

[53]  Jason Cong,et al.  RASP: A General Logic Synthesis System for SRAM-Based FPGAs , 1996, Fourth International ACM Symposium on Field-Programmable Gate Arrays.

[54]  Steven J. E. Wilton,et al.  The memory/logic interface in FPGAs with large embedded memory arrays , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[55]  Carl Ebeling,et al.  SubGemini: Identifying SubCircuits using a Fast Subgraph Isomorphism Algorithm , 1993, 30th ACM/IEEE Design Automation Conference.

[56]  Scott Hauck,et al.  High-performance carry chains for FPGAs , 1998, FPGA '98.

[57]  Scott McMillan,et al.  A high I/O reconfigurable crossbar switch , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[58]  Steven J. E. Wilton,et al.  Logical-to-Physical Memory Mapping for FPGAs with Dual-Port Embedded Arrays , 1999, FPL.

[59]  Jonathan Rose,et al.  Architecture of field-programmable gate arrays: the effect of logic block functionality on area efficiency , 1990 .

[60]  J. Rose,et al.  The effect of LUT and cluster size on deep-submicron FPGA performance and density , 2000, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[61]  Frank M. Johannes,et al.  Performance optimization by interacting netlist transformations andplacement , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[62]  Jonathan Rose,et al.  Design, layout and verification of an FPGA using automated tools , 2005, FPGA '05.

[63]  White Paper Using Stratix GX in Switch Fabric Systems , 1998 .

[64]  Jason Cong,et al.  Structural gate decomposition for depth-optimal technology mapping in LUT-based FPGA designs , 2000, TODE.

[65]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.

[66]  Anthony J. Yu,et al.  Directional and single-driver wires in FPGA interconnect , 2004, Proceedings. 2004 IEEE International Conference on Field- Programmable Technology (IEEE Cat. No.04EX921).

[67]  Malgorzata Marek-Sadowska,et al.  Efficient circuit clustering for area and power reduction in FPGAs , 2002, FPGA '02.

[68]  Steven J. E. Wilton,et al.  An SRAM-programmable field-configurable memory , 1995, Proceedings of the IEEE 1995 Custom Integrated Circuits Conference.

[69]  J. Birkner,et al.  A very-high-speed field-programmable gate array using metal-to-metal antifuse programmable elements , 1992 .

[70]  Aaron Charles Egier,et al.  Enhancing and Using an Automatic Design System for Creating FPGAs , 2005 .

[71]  Andrew B. Kahng,et al.  Faster minimization of linear wirelength for global placement , 1997, ISPD '97.

[72]  John P. Fishburn,et al.  TILOS: A posynomial programming approach to transistor sizing , 2003, ICCAD 2003.

[73]  Georg Sigl,et al.  GORDIAN: VLSI placement by quadratic programming and slicing optimization , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[74]  Jonathan Rose,et al.  A detailed router for field-programmable gate arrays , 1990, 1990 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers.

[75]  Pierre Marchal,et al.  Field-programmable gate arrays , 1999, CACM.

[76]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[77]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[78]  Shin'ichiro Mutoh,et al.  1-V power supply high-speed digital circuit technology with multithreshold-voltage CMOS , 1995, IEEE J. Solid State Circuits.

[79]  Andrew B. Kahng,et al.  Partitioning-based standard-cell global placement with an exact objective , 1997, ISPD '97.

[80]  Jason Cong,et al.  Performance-driven technology mapping for heterogeneous FPGAs , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[81]  Jonathan Rose,et al.  Enhancing the area-efficiency of FPGAs with hard circuits using shadow clusters , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[82]  Guy G.F. Lemieux A Detailed Routing Algorithm for Allocating Wire Segments in Field-Programmable Gate Arrays , 1998 .

[83]  Robert K. Brayton,et al.  Multilevel logic synthesis , 1990, Proc. IEEE.

[84]  Jason Cong,et al.  FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..