Column-Based Precompiled Configurating Techniques for FPGA Fault Tolerance

The abundance of configurable logic elements and routing resources in recent Field-Programmable Gate Arrays (FPGAs) provides a cost-effective method for tolerating permanent faults in the system. Once a permanent fault occurs, the FPGA can be reconfigured by replacing the faulty part with previously unused resources in the same hardware. In this paper, we present two column-based precompiled configuration techniques for tolerating permanent faults in FPGA-based systems. By compiling alternative configuration versions in the design phase, these approaches ensure fast reconfiguration, and thus a tremendous increase in system availability. In addition, intentional similarities are created among different configuration versions so that the storage overhead due to precompiled configurations is reduced by orders of magnitude through differential coding and runlength coding. Experimental and analytical results show that our approaches achieve significant dependability improvement with small configuration storage overhead.

[1]  Edward J. McCluskey,et al.  A reliable LZ data compressor on reconfigurable coprocessors , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[2]  John M. Emmert,et al.  Incremental routing in FPGAs , 1998, Proceedings Eleventh Annual IEEE International ASIC Conference (Cat. No.98TH8372).

[3]  Charles E. Stroud,et al.  BIST-based diagnostics of FPGA logic blocks , 1997, Proceedings International Test Conference 1997.

[4]  Edward J. McCluskey,et al.  Fault Location in FPGA-Based Reconfigurable Systems , 1998 .

[5]  Nur A. Touba,et al.  A low cost approach for detecting, locating, and avoiding interconnect faults in FPGA-based reconfigurable systems , 1999, Proceedings Twelfth International Conference on VLSI Design. (Cat. No.PR00013).

[6]  Santosh K. Shrivastava,et al.  Reliable Computer Systems , 1985, Texts and Monographs in Computer Science.

[7]  Jonathan Rose,et al.  Partial Reconfiguration of FPGA Mapped Designs with Applications to Fault Tolerance and Yield Enhancement , 1997 .

[8]  Dhiraj K. Pradhan,et al.  Fault-tolerant computer system design , 1996 .

[9]  Daniel P. Siewiorek,et al.  Reliable Computer Systems: Design and Evaluation, Third Edition , 1998 .

[10]  Shantanu Dutt,et al.  Efficient incremental rerouting for fault reconfiguration in field programmable gate arrays , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[11]  S. Golomb Run-length encodings. , 1966 .

[12]  Edward J. McCluskey,et al.  Dependable adaptive computing systems-the ROAR project , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[13]  Miodrag Potkonjak,et al.  Efficiently supporting fault-tolerance in FPGAs , 1998, FPGA '98.

[14]  Edward J. McCluskey,et al.  A memory coherence technique for online transient error recovery of FPGA configurations , 2001, FPGA '01.

[15]  Miodrag Potkonjak,et al.  Algorithms for efficient runtime fault recovery on diverse FPGA architectures , 1999, Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT'99).

[16]  Edward J. McCluskey,et al.  Which concurrent error detection scheme to choose ? , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[17]  Shantanu Dutt,et al.  Efficient network-flow based techniques for dynamic fault reconfiguration in FPGAs , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[18]  Shantanu Dutt,et al.  Methodologies for Tolerating Cell and Interconnect Faults in FPGAs , 1998, IEEE Trans. Computers.

[19]  Edward J. McCluskey,et al.  Dependable Computing and Online Testing in Adaptive and Configurable Systems , 2000, IEEE Des. Test Comput..

[20]  Russell Tessier,et al.  Tolerating operational faults in cluster-based FPGAs , 2000, FPGA '00.

[21]  Edward J. McCluskey,et al.  Transient errors and rollback recovery in LZ compression , 2000, Proceedings. 2000 Pacific Rim International Symposium on Dependable Computing.

[22]  Charles E. Stroud,et al.  Built-in self-test of FPGA interconnect , 1998, Proceedings International Test Conference 1998 (IEEE Cat. No.98CH36270).

[23]  Edward J. McCluskey,et al.  Dependable Adaptive Computing Systems the Stanford Crc Roar Project , 2001 .

[24]  M. Caffrey,et al.  SEU Mitigation Techniques for Virtex FPGAs in Space Applications , 1999 .