Automatic systems of fault-tolerant vlsi systems

Advances in VLSI Technology are making it feasible to design area-efficient, hardware-oriented approaches to fault-tolerance. In order to realize such high-quality, area-efficient ICs without encumbering the designer, fault-tolerance constraints should be explicitly ingrained into a computer aided design methodology. In this dissertation the area overhead of fault-tolerance is aggressively optimized by exploiting the symbiosis between the fault-tolerance mechanisms and the design abstractions wherein they are incorporated. The distinctive contributions of the thesis are as follows: The non-permanent nature of transient faults makes them amenable to recovery via checkpointing and rollback. Based on this principle, we have devised an algorithm for coactive scheduling (of operations to clock cycles and data transfers to clock cycle boundaries) and checkpoint determination during self-recovering microarchitecture synthesis for supporting fault-recovery in hardware. Since transient faults are a major source of error in most application environments, we expect that self recovering VLSI systems will be in great demand in the near future. The simplest fault-detection scheme involves straightforward duplication of a computation followed by voting. However, it entails substantial hardware overhead. We have ingrained an alternate area-efficient fault-detection strategy (into microarchitectural synthesis) in which intermediate computations are carefully selected and voted upon so as to improve the utilization of hardware (and thereby minimize the overall hardware). The reliability of a VLSI system can be enhanced by injecting redundancy into its constituent modules. We have devised synthesis algorithms that trade modular redundancy either for system throughput or for chip area to maximize system reliability. These algorithms, which are based on greedy iterative improvement, can be used to design reliable VLSI systems which are necessary for mission-critical applications. Tolerance of a VLSI IC to fabrication-time spot defects can be enhanced in a straightforward manner by using conservative design rules. However, the additional area is significant. Consequently, we have developed an area-efficient, defect-tolerant, layout synthesis system that disperses nets with large overlaps. In summary, the thesis has investigated the problems arising out of the synergies between fault-tolerant computing, emerging VLSI technologies, and CAD, pioneering a new research area of immediate practical relevance.