Towards on-chip fault-tolerant communication

As CMOS technology scales down into the deep-submicron (DSM) domain, devices and interconnects are subject to new types of malfunctions and failures that are harder to predict and avoid with the current system-on-chip (SoC) design methodologies. Relaxing the requirement of 100% correctness in operation drastically reduces the costs of design but, at the same time, requires SoCs be designed with some degree of system-level fault-tolerance. In this paper, we introduce a high-level model of DSM failure patterns and propose a new communication paradigm for SoCs, namely stochastic communication. Specifically, for a generic tile-based architecture, we propose a randomized algorithm which not only separates computation from communication, but also provides the required fault-tolerance to on-chip failures. This new technique is easy and cheap to implement in SoCs that integrate a large number of communicating IP cores.

[1]  Norman T. J. Bailey,et al.  The Mathematical Theory of Infectious Diseases , 1975 .

[2]  Grant Martin Design methodologies for system level IP , 1998, Proceedings Design, Automation and Test in Europe.

[3]  Satish Kumar,et al.  Next century challenges: scalable coordination in sensor networks , 1999, MobiCom.

[4]  Kenneth P. Birman,et al.  Bimodal multicast , 1999, TOCS.

[5]  Hannu Tenhunen,et al.  Interconnection of autonomous error-tolerant cells , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).