Measuring and Navigating the Gap Between FPGAs and ASICs

Measuring and Navigating the Gap Between FPGAs and ASICs Ian Carlos Kuon Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 2008 Field-programmable gate arrays (FPGAs) have enjoyed increasing use due to their low non-recurring engineering (NRE) costs and their straightforward implementation process. However, it is recognized that they have higher per unit costs, poorer performance and increased power consumption compared to custom alternatives, such as applicationspecific integrated circuits (ASICs). This thesis investigates the extent of this gap and it examines the trade-offs that can be made to narrow it. The gap between 90 nm FPGAs and ASICs was measured for many benchmark circuits. For circuits that only make use of general-purpose combinational logic and flipflops, the FPGA-based implementation requires 35 times more area on average than an equivalent ASIC. Modern FPGAs also contain “hard” specific-purpose circuits such as multipliers and memories and these blocks are found to narrow the average gap to 18 for our benchmarks or, potentially, as low as 4.7 when the hard blocks are heavily used. The FPGA was found to be on average between 3.4 and 4.6 times slower than an ASIC and this gap was not influenced significantly by hard memories and multipliers. The dynamic power consumption is approximately 14 times greater on average on the FPGA than on the ASIC but hard blocks showed promise for reducing this gap. This is one of the most comprehensive analyses of the gap performed to date. The thesis then focuses on exploring the area and delay trade-offs possible through architecture, circuit structure and transistor sizing. These trade-offs can be used to selectively narrow the FPGA to ASIC gap but past explorations have been limited in

[1]  Guy Lemieux,et al.  An improved "soft" eFPGA design and implementation strategy , 2005, Proceedings of the IEEE 2005 Custom Integrated Circuits Conference, 2005..

[2]  Fei Li,et al.  Device and architecture co-optimization for FPGA power reduction , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[3]  Arifur Rahman,et al.  Evaluation of low-leakage design techniques for field programmable gate arrays , 2004, FPGA '04.

[4]  Alberto L. Sangiovanni-Vincentelli,et al.  DELIGHT.SPICE: an optimization-based system for the design of integrated circuits , 1988, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[5]  Fei Li,et al.  Vdd programmability to reduce FPGA interconnect power , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[6]  Jonathan Rose,et al.  Design, layout and verification of an FPGA using automated tools , 2005, FPGA '05.

[7]  P. R. Stephan,et al.  SIS : A System for Sequential Circuit Synthesis , 1992 .

[8]  Jim Park,et al.  Interconnect enhancements for a high-speed PLD architecture , 2002, FPGA '02.

[9]  Jason Cong,et al.  On Area/Depth Trade-off in LUT-Based FPGA Technology Mapping , 1993, 30th ACM/IEEE Design Automation Conference.

[10]  Li Shang,et al.  Dynamic power consumption in Virtex™-II FPGA family , 2002, FPGA '02.

[11]  Stratix II vs. Virtex-4 Power Comparison & Estimation Accuracy White Paper , 2005 .

[12]  Vaughn Betz,et al.  Circuit design, transistor sizing and wire layout of FPGA interconnect , 1999, Proceedings of the IEEE 1999 Custom Integrated Circuits Conference (Cat. No.99CH36327).

[13]  Alberto Sangiovanni-Vincentelli,et al.  SPICE: An optimization-based system for the design of integrated circuits , 1988, ICCAD 1988.

[14]  Shahriar Mirabbasi,et al.  Interconnect Driver Design for Long Wires in Field-Programmable Gate Arrays , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[15]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[16]  RoseJonathan,et al.  The effect of LUT and cluster size on deep-submicron FPGA performance and density , 2004 .

[17]  D. James 2004 - the year of 90-nm: a review of 90 nm devices , 2005, IEEE/SEMI Conference and Workshop on Advanced Semiconductor Manufacturing 2005..

[18]  Jonathan Rose,et al.  A detailed router for field-programmable gate arrays , 1990, 1990 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers.

[19]  Majid Sarrafzadeh,et al.  Routability driven white space allocation for fixed-die standard-cell placement , 2002, ISPD '02.

[20]  Jason Cong,et al.  Buffered Steiner tree construction with wire sizing for interconnect layout optimization , 1996, ICCAD 1996.

[21]  Vaughn Betz,et al.  A fast routability-driven router for FPGAs , 1998, FPGA '98.

[22]  John K. Ousterhout Switch-Level Delay Models for Digital MOS VLSI , 1984, 21st Design Automation Conference Proceedings.

[23]  Vivek Tiwari,et al.  Topological analysis for leakage prediction of digital circuits , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[24]  Peter Andrew Jamieson,et al.  Improving the Area Efficiency of Heterogeneous FPGAs with Shadow Clusters , 2007 .

[25]  Vivek De,et al.  Technology and design challenges for low power and high performance [microprocessors] , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[26]  R. Saleh,et al.  Design considerations for soft embedded programmable logic cores , 2005, IEEE Journal of Solid-State Circuits.

[27]  Jason Helge Anderson,et al.  Low-power programmable routing circuitry for FPGAs , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[28]  Jonathan Rose,et al.  Automatic transistor and physical design of FPGA tiles from an architectural specification , 2003, FPGA '03.

[29]  Paul Penfield,et al.  Signal Delay in RC Tree Networks , 1981, 18th Design Automation Conference.

[30]  Sachin S. Sapatnekar,et al.  A new class of convex functions for delay modeling and itsapplication to the transistor sizing problem [CMOS gates] , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[31]  Martin D. F. Wong,et al.  Fast and exact simultaneous gate and wire sizing by Lagrangian relaxation , 1998, 1998 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (IEEE Cat. No.98CB36287).

[32]  W. C. Elmore The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers , 1948 .

[33]  Andrew B. Kahng,et al.  Faster minimization of linear wirelength for global placement , 1997, ISPD '97.

[34]  Wayne Luk,et al.  Virtual Embedded Blocks: A Methodology for Evaluating Embedded Elements in FPGAs , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[35]  Dan Clein,et al.  CMOS IC Layout: Concepts, Methodologies, and Tools , 1999 .

[36]  Steven J. E. Wilton,et al.  A detailed power model for field-programmable gate arrays , 2005, TODE.

[37]  Sung-Mo Kang,et al.  An exact solution to the transistor sizing problem for CMOS circuits using convex optimization , 1993, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[38]  S. Yang,et al.  Logic Synthesis and Optimization Benchmarks User Guide Version 3.0 , 1991 .

[39]  Vaughn Betz,et al.  Speed and area tradeoffs in cluster-based FPGA architectures , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[40]  Jonathan Rose,et al.  Architecture of field-programmable gate arrays: the effect of logic block functionality on area efficiency , 1990 .

[41]  Carl Ebeling,et al.  Placement and routing tools for the Triptych FPGA , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[42]  Alexander R. Marquardt,et al.  Cluster-Based Architecture, Timing-Driven Packing and Timing-Driven Placement for FPGAs , 1999 .

[43]  Jonathan Rose,et al.  Automated transistor sizing for FPGA architecture exploration , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[44]  S. Sapatnekar,et al.  A New Class of Convex Functions for Delay Modeling and Its Application to the Transistor Sizing Problem , 2000 .

[45]  Robert K. Brayton,et al.  Multilevel logic synthesis , 1990, Proc. IEEE.

[46]  Farid N. Najm,et al.  An adaptive FPGA architecture with process variation compensation and reduced leakage , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[47]  David M. Lewis,et al.  Routing architectures for hierarchical field programmable gate arrays , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[48]  W. James MacLean,et al.  Video-rate stereo depth measurement on programmable hardware , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[49]  David Lewis,et al.  Using Sparse Crossbars within LUT Clusters , 2001 .

[50]  Jonathan Rose,et al.  A high-speed ray tracing engine built on a field-programmable system , 2003, Proceedings. 2003 IEEE International Conference on Field-Programmable Technology (FPT) (IEEE Cat. No.03EX798).

[51]  William J. Dally,et al.  Explaining the gap between ASIC and custom power: a custom perspective , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[52]  Jonathan Rose,et al.  Modeling routing demand for early-stage FPGA architecture development , 2008, FPGA '08.

[53]  Alberto L. Sangiovanni-Vincentelli,et al.  ECSTASY: a new environment for IC design optimization , 1988, [1988] IEEE International Conference on Computer-Aided Design (ICCAD-89) Digest of Technical Papers.

[54]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[55]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[56]  Jason Cong,et al.  FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[57]  Steven J. E. Wilton,et al.  On the Interaction between Power-Aware Computer-Aided Design Algorithms for Field-Programmable Gate Arrays , 2005, J. Low Power Electron..

[58]  Mahmut T. Kandemir,et al.  A Dual-VDD Low Power FPGA Architecture , 2004, FPL.

[59]  Scott Hauck,et al.  Flexible Routing Architecture Generation for Domain-Specific Reconfigurable Subsystems , 2002, FPL.

[60]  David Harris,et al.  CMOS VLSI Design: A Circuits and Systems Perspective , 2004 .

[61]  Hiran Tennakoon,et al.  Efficient and accurate gate sizing with piecewise convex delay models , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[62]  Julien Lamoureux,et al.  On the Interaction Between Power-Aware FPGA CAD Algorithms , 2003, ICCAD 2003.

[63]  Elias Ahmed,et al.  THE EFFECT OF LOGIC BLOCK GRANULARITY ON DEEP-SUBMICRON FPGA PERFORMANCE AND DENSITY , 2001 .

[64]  David G. Chinnery,et al.  Closing the power gap between ASIC and custom: an ASIC perspective , 2000, Proceedings. 42nd Design Automation Conference, 2005..

[65]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[66]  Jason Cong,et al.  Optimal wiresizing for interconnects with multiple sources , 1995, ICCAD.

[67]  Scott Hauck,et al.  Automatic layout of domain-specific reconfigurable subsystems for system-on-a-chip , 2002, FPGA '02.

[68]  David G. Chinnery,et al.  Closing the Gap Between ASIC and Custom - Tools and Techniques for High-Performance ASIC Design , 2002 .

[69]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.

[70]  G. Gasiot,et al.  Impacts of front-end and middle-end process modifications on terrestrial soft error rate , 2005, IEEE Transactions on Device and Materials Reliability.

[71]  Keshab K. Parhi,et al.  Fast and exact transistor sizing based on iterative relaxation , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[72]  Anthony J. Yu,et al.  Directional and single-driver wires in FPGA interconnect , 2004, Proceedings. 2004 IEEE International Conference on Field- Programmable Technology (IEEE Cat. No.04EX921).

[73]  Steven J. E. Wilton,et al.  Architectures and algorithms for field-programmable gate arrays with embedded memory , 1997 .

[74]  Philip N. Strenski,et al.  Uncertainty-aware circuit optimization , 2002, DAC '02.

[75]  Vaughn Betz,et al.  The stratixπ routing and logic architecture , 2003, FPGA '03.

[76]  Fei Li,et al.  FPGA power reduction using configurable dual-Vdd , 2004, Proceedings. 41st Design Automation Conference, 2004..

[77]  Jason Cong,et al.  Power modeling and characteristics of field programmable gate arrays , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[78]  Jonathan Rose,et al.  Application-specific customization of soft processor microarchitecture , 2006, FPGA '06.

[79]  Jason Cong,et al.  RASP: A General Logic Synthesis System for SRAM-Based FPGAs , 1996, Fourth International ACM Symposium on Field-Programmable Gate Arrays.

[80]  M. Liang,et al.  A 90-nm CMOS device technology with high-speed, general-purpose, and low-leakage transistors for system on chip applications , 2002, Digest. International Electron Devices Meeting,.

[81]  A. Sangiovanni-Vincentelli,et al.  The TimberWolf placement and routing package , 1985, IEEE Journal of Solid-State Circuits.

[82]  John P. Fishburn,et al.  TILOS: A posynomial programming approach to transistor sizing , 2003, ICCAD 2003.

[83]  Georg Sigl,et al.  GORDIAN: VLSI placement by quadratic programming and slicing optimization , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[84]  Philip N. Strenski,et al.  Gradient-based optimization of custom circuits using a static-timing formulation , 1999, DAC '99.

[85]  Brian W. Kernighan,et al.  A Procedure for Placement of Standard-Cell VLSI Circuits , 1985, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[86]  Charlie Chung-Ping Chen,et al.  Fast and exact simultaneous gate and wire sizing by Lagrangian relaxation , 1998, ICCAD.

[87]  Jonathan Rose,et al.  Enhancing the area-efficiency of FPGAs with hard circuits using shadow clusters , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[88]  A. Leon-Garcia,et al.  A 50,000 transistor packet-switching chip for the Starburst ATM switch , 1995, Proceedings of the IEEE 1995 Custom Integrated Circuits Conference.

[89]  Jonathan Rose,et al.  Area and delay trade-offs in the circuit and architecture design of FPGAs , 2008, FPGA '08.

[90]  Sachin S. Sapatnekar,et al.  Convex delay models for transistor sizing , 2000, DAC.

[91]  Altera Apex ii programmable logic device family data sheet , 2002 .

[92]  Paul S. Zuchowski,et al.  A hybrid ASIC and FPGA architecture , 2002, IEEE/ACM International Conference on Computer Aided Design, 2002. ICCAD 2002..

[93]  Andrew B. Kahng,et al.  Fidelity and near-optimality of Elmore-based routing constructions , 1993, Proceedings of 1993 IEEE International Conference on Computer Design ICCD'93.

[94]  Vaughn Betz,et al.  Using cluster-based logic blocks and timing-driven packing to improve FPGA speed and density , 1999, FPGA '99.

[95]  Jason Helge Anderson,et al.  A novel low-power FPGA routing switch , 2004, Proceedings of the IEEE 2004 Custom Integrated Circuits Conference (IEEE Cat. No.04CH37571).

[96]  Michael John Sebastian Smith,et al.  Application-specific integrated circuits , 1997 .

[97]  Paul Chow,et al.  Reconfigurable molecular dynamics simulator , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[98]  Jordan S. Swartz A High-Speed Timing-Aware Router for FPGAs , 1998 .

[99]  Michael Chan,et al.  CycloneTM: A Low-Cost, High-Performance FPGA , 2005 .

[100]  Jean E. Weber Mathematical analysis; business and economic applications , 1972 .

[101]  Vaughn Betz,et al.  The Stratix II logic and routing architecture , 2005, FPGA '05.

[102]  J. Gregory Steffan,et al.  The microarchitecture of FPGA-based soft processors , 2005, CASES '05.

[103]  Bernhard Hoppe,et al.  Transmission gate delay models for circuit optimization , 1990, Proceedings of the European Design Automation Conference, 1990., EDAC..

[104]  Scott Hauck,et al.  Automatic Design of Area-Efficient Configurable ASIC Cores , 2007, IEEE Transactions on Computers.

[105]  R. M. Warner Applying a composite model to the IC yield problem , 1974 .

[106]  Jonathan Rose,et al.  The effect of LUT and cluster size on deep-submicron FPGA performance and density , 2004 .

[107]  William J. Dally,et al.  The role of custom design in ASIC chips , 2000, Proceedings 37th Design Automation Conference.

[108]  Andrew R. Conn,et al.  JiffyTune: circuit optimization using time-domain sensitivities , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[109]  Jonathan Rose,et al.  Synthesis methods for field programmable gate arrays , 1993 .

[110]  Vikas Chandra,et al.  Simultaneous optimization of driving buffer and routing switch sizes in an FPGA using an iso-area approach , 2002, Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002.

[111]  Guy Lemieux,et al.  Analytical Framework for Switch Block Design , 2002, FPL.