Adaptive fault recovery for networked reconfigurable systems

The device-level size and complexity of reconfigurable architectures makes fault tolerance an important concern in system design. In this paper, we introduce a fully automated fault recovery system for networked systems, which contain FPGAs (field programmable gate arrays). If a fault is detected hat cannot be addressed locally, fault information is transferred to a reconfiguration server. Following design recompilation to avoid the fault, a new FPGA configuration is returned to the remote system and computation is reinitiated. To illustrate the benefit of this approach, we have implemented a complete fault recovery system, which requires no manual intervention. An important part of the system is a timing-driven incremental router for Xilinx Virtex devices. This router is directly interfaced to Xilinx JBits and uses no CAD tools from the standard Xilinx Alliance tool flow. Our completed system has been applied to three benchmark designs and exhibits complete fault recovery in up to 12x less time than the standard incremental Xilinx PAR flow.

[1]  Edward J. McCluskey,et al.  Column-Based Precompiled Configuration Techniques for FPGA , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[2]  Edward J. McCluskey,et al.  Dependable Computing and Online Testing in Adaptive and Configurable Systems , 2000, IEEE Des. Test Comput..

[3]  Russell Tessier,et al.  Tolerating operational faults in cluster-based FPGAs , 2000, FPGA '00.

[4]  John M. Emmert,et al.  Incremental routing in FPGAs , 1998, Proceedings Eleventh Annual IEEE International ASIC Conference (Cat. No.98TH8372).

[5]  Roger M. Needham,et al.  TEA, a Tiny Encryption Algorithm , 1994, FSE.

[6]  S. Webber,et al.  The Stratus architecture , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[7]  Carl Ebeling,et al.  Placement and routing tools for the Triptych FPGA , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[8]  Jonathan Rose,et al.  Routing for FPGAs , 1992 .

[9]  Jason Cong,et al.  FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[10]  Edward J. McCluskey,et al.  On-line testing and recovery in TMR systems for real-time applications , 2001, Proceedings International Test Conference 2001 (Cat. No.01CH37260).

[11]  Russell Tessier,et al.  Diagnosis of interconnect faults in cluster-based FPGA architectures , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).

[12]  Seth Copen Goldstein,et al.  Tunable fault tolerance for runtime reconfigurable architectures , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[13]  Michael John Sebastian Smith,et al.  Internet Connected FPL , 2000, FPL.

[14]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.

[15]  Vaughn Betz,et al.  Automatic generation of FPGA routing architectures from high-level descriptions , 2000, FPGA '00.

[16]  Dinesh Bhatia,et al.  Partial reconfiguration of FPGA mapped designs with applications to fault tolerance and yield enhancement , 1997, FPL.

[17]  Delon Levi,et al.  JBits: Java based interface for reconfigurable computing , 1999 .

[18]  Vaughn Betz,et al.  A fast routability-driven router for FPGAs , 1998, FPGA '98.

[19]  Shantanu Dutt,et al.  Methodologies for Tolerating Cell and Interconnect Faults in FPGAs , 1998, IEEE Trans. Computers.

[20]  Russell Tessier,et al.  Testing and diagnosis of interconnect faults in cluster-based FPGA architectures , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[21]  Diederik Verkest,et al.  Design technology for networked reconfigurable FPGA platforms , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[22]  Russell Tessier Negotiated A* Routing for FPGAs ∗ , 1998 .

[23]  Eric Keller JRoute: A Run-Time Routing API for FPGA Hardware , 2000, IPDPS Workshops.

[24]  Martin C. Brown Activeperl Developer's Guide , 2000 .

[25]  Shantanu Dutt,et al.  A search-based bump-and-refit approach to incremental routing for ECO applications in FPGAs , 2001, ICCAD 2001.

[26]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[27]  Tarek El-Ghazawi,et al.  Effective Use of Networked Reconfigurable Resources , 2001 .