As CMOS technology scales down into the deep-submicron (DSM) domain, devices and interconnects are subject to new types of malfunctions and failures that are harder to predict and avoid with the current system-on-chip (SoC) design methodologies. Relaxing the requirement of 100% correctness in operation drastically reduces the costs of design but, at the same time, requires SoCs be designed with some degree of system-level fault-tolerance. In this paper, we introduce a high-level model of DSM failure patterns and propose a new communication paradigm for SoCs, namely stochastic communication. Specifically, for a generic tile-based architecture, we propose a randomized algorithm which not only separates computation from communication, but also provides the required fault-tolerance to on-chip failures. This new technique is easy and cheap to implement in SoCs that integrate a large number of communicating IP cores.
[1]
Norman T. J. Bailey,et al.
The Mathematical Theory of Infectious Diseases
,
1975
.
[2]
Grant Martin.
Design methodologies for system level IP
,
1998,
Proceedings Design, Automation and Test in Europe.
[3]
Satish Kumar,et al.
Next century challenges: scalable coordination in sensor networks
,
1999,
MobiCom.
[4]
Kenneth P. Birman,et al.
Bimodal multicast
,
1999,
TOCS.
[5]
Hannu Tenhunen,et al.
Interconnection of autonomous error-tolerant cells
,
2002,
2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).