A memory coherence technique for online transient error recovery of FPGA configurations

The partial reconfiguration feature of some of the current-generation Field Programmable Gate Arrays (FPGAs) can improve dependability by detecting and correcting errors in on-chip configuration data. Such an error recovery process can be executed online with minimal interference of user applications. However, because Look-up Tables (LUTs) in Configurable Logic Blocks (CLBs) of FPGAs can also implement memory modules for user applications, a memory coherence issue arises such that memory contents in user applications may be altered by the online configuration data recovery process. In this paper, we investigate this memory coherence problem and propose a memory coherence technique that does not impose extra constraints on the placement of memory-configured LUTs. Theoretical analyses and simulation results show that the proposed technique guarantees the memory coherence with a very small (on the order of 0.1%) execution time overhead in user applications.

[1]  Edward J. McCluskey,et al.  Transient errors and rollback recovery in LZ compression , 2000, Proceedings. 2000 Pacific Rim International Symposium on Dependable Computing.

[2]  M. Caffrey,et al.  SEU Mitigation Techniques for Virtex FPGAs in Space Applications , 1999 .

[3]  Nur A. Touba,et al.  A low cost approach for detecting, locating, and avoiding interconnect faults in FPGA-based reconfigurable systems , 1999, Proceedings Twelfth International Conference on VLSI Design. (Cat. No.PR00013).

[4]  Edward J. McCluskey,et al.  A reliable LZ data compressor on reconfigurable coprocessors , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[5]  Miodrag Potkonjak,et al.  Algorithms for efficient runtime fault recovery on diverse FPGA architectures , 1999, Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT'99).

[6]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[7]  Shantanu Dutt,et al.  Efficient network-flow based techniques for dynamic fault reconfiguration in FPGAs , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[8]  Janak H. Patel,et al.  A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.

[9]  Edward J. McCluskey,et al.  Which concurrent error detection scheme to choose ? , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[10]  Shantanu Dutt,et al.  Methodologies for Tolerating Cell and Interconnect Faults in FPGAs , 1998, IEEE Trans. Computers.

[11]  Edward J. McCluskey,et al.  Dependable adaptive computing systems-the ROAR project , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[12]  Steven Trimberger,et al.  Scheduling designs into a time-multiplexed FPGA , 1998, FPGA '98.

[13]  John M. Emmert,et al.  Incremental routing in FPGAs , 1998, Proceedings Eleventh Annual IEEE International ASIC Conference (Cat. No.98TH8372).

[14]  Malgorzata Marek-Sadowska,et al.  Partitioning sequential circuits on dynamically reconfiguable FPGAs , 1998, FPGA '98.

[15]  Edward J. McCluskey,et al.  Dependable Computing and Online Testing in Adaptive and Configurable Systems , 2000, IEEE Des. Test Comput..

[16]  Russell Tessier,et al.  Tolerating operational faults in cluster-based FPGAs , 2000, FPGA '00.

[17]  Dhiraj K. Pradhan,et al.  Fault-tolerant computer system design , 1996 .

[18]  Malgorzata Marek-Sadowska,et al.  Partitioning Sequential Circuits on Dynamically Reconfigurable FPGAs , 1999, IEEE Trans. Computers.

[19]  Edward J. McCluskey,et al.  Fault Location in FPGA-Based Reconfigurable Systems , 1998 .