Tolerating operational faults in cluster-based FPGAs

In recent years the application space of reconfigurable devices has grown to include many platforms with a strong need for fault tolerance. While these systems frequently contain hardware redundancy to allow for continued operation in the presence of operational faults, the need to recover faulty hardware and return it to full functionality quickly and efficiently is great. In addition to providing functional density, FPGAs provide a level of fault tolerance generally not found in mask-programmable devices by including the capability to reconfigure around operational faults in the field. In this paper, incremental CAD techniques are described that allow functional recovery of FPGA design configurations in the presence of single or multiple operational faults. Our preferred approach to fault recovery takes advantage of device routing hierarchy in architectural families such as Xilinx Virtex [2] and Altera Apex [3] to quickly swap unused logic and routing resources in place of faulty ones within logic clusters. These algorithms allow for straight-forward implementation within a local fault-tolerant system without the need to access a remote processing location. If initial recovery attempts through localized swapping fail, an incremental router based on the widely-used PathFinder maze routing algorithm [10] can be applied remotely in an attempt to form connections between newly-allocated logic and interconnect based on the history of the initial design route.

[1]  Robert J. Smith,et al.  Performance of Interconnection Rip-Up and Reroute Strategies , 1981, 18th Design Automation Conference.

[2]  S. Webber,et al.  The Stratus architecture , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[3]  Dwight D. Hill,et al.  A CAD system for the design of field programmable gate arrays , 1991, 28th ACM/IEEE Design Automation Conference.

[4]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[5]  Andrew M. Tyrrell,et al.  The yield enhancement of field-programmable gate arrays , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[6]  Jason Cong,et al.  FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[7]  K. W. Bernhardt Advanced technologies for a command and data handling subsystem in a "better, faster, cheaper" environment , 1995, Proceedings of 14th Digital Avionics Systems Conference.

[8]  Carl Ebeling,et al.  Placement and routing tools for the Triptych FPGA , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[9]  C. L. Liu,et al.  Timing driven placement reconfiguration for fault tolerance and yield enhancement in FPGAs , 1996, Proceedings ED&TC European Design and Test Conference.

[10]  Jonathan Rose,et al.  Generation of synthetic sequential benchmark circuits , 1997, FPGA '97.

[11]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[12]  Dinesh Bhatia,et al.  Partial reconfiguration of FPGA mapped designs with applications to fault tolerance and yield enhancement , 1997, FPL.

[13]  R. Showstack Better, faster, cheaper , 1997 .

[14]  Victor Lee,et al.  The RAW benchmark suite: computation structures for general purpose computing , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[15]  Kenneth A. LaBel,et al.  Radiation effects on current field programmable technologies , 1997 .

[16]  Vaughn Betz,et al.  Cluster-based logic blocks for FPGAs: area-efficiency vs. input sharing and size , 1997, Proceedings of CICC 97 - Custom Integrated Circuits Conference.

[17]  Scott Hauck,et al.  Data security for Web-based CAD , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[18]  John M. Emmert,et al.  Incremental routing in FPGAs , 1998, Proceedings Eleventh Annual IEEE International ASIC Conference (Cat. No.98TH8372).

[19]  Vaughn Betz,et al.  A fast routability-driven router for FPGAs , 1998, FPGA '98.

[20]  Shantanu Dutt,et al.  Methodologies for Tolerating Cell and Interconnect Faults in FPGAs , 1998, IEEE Trans. Computers.

[21]  Miodrag Potkonjak,et al.  On-line fault detection for bus-based field programmable gate arrays , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[22]  Russell Tessier Negotiated A* Routing for FPGAs ∗ , 1998 .