Formally enhanced runtime verification to ensure NoC functional correctness

As silicon technology scales, modern processors and embedded systems are rapidly shifting towards complex chip multi-processor (CMP) and system-on-chip (SoC) designs, comprising several processor cores and IP components communicating via a network-on-chip (NoC). As a side-effect of this trend, ensuring their correctness has become increasingly problematic. In particular, the network-on-chip often includes complex features and components to support the required communication bandwidth among the nodes in the system. In this landscape, it is no wonder that design errors in the NoC may go undetected and escape into the final silicon, with potential detrimental impact on the overall system. In this work, we propose ForEVeR, a solution that complements the use of formal methods and runtime verification to ensure functional correctness in NoCs. Formal verification, due to its scalability limitations, is used to verify the smaller modules, such as individual router components. We complete the protection against escaped design errors with a runtime technique, a network-level error detection and recovery solution, which monitors the traffic in the NoC and protects it against escaped functional bugs that affect the communication paths in the network. To this end, ForEVeR augments the baseline NoC with a lightweight checker network that alerts destination nodes of incoming packets ahead of time. If a bug is detected, flagged by missed packet arrivals, a recovery mechanism delivers the in-flight data safely to the intended destination via the checker network. ForEVeR's experimental evaluation shows that it can recover from NoC design errors at only 4.8% area cost for an 8×8 mesh interconnect, with a recovery performance cost of less than 30K cycles per functional bug manifestation. Additionally, it incurs no performance overhead in the absence of errors.

[1]  Chris Fallin,et al.  Next generation on-chip networks: what kind of congestion control do we need? , 2010, Hotnets-IX.

[2]  Mark G. Karpovsky,et al.  Application of network calculus to general topologies using turn-prohibition , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[3]  Pedro López,et al.  A very efficient distributed deadlock detection mechanism for wormhole networks , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[4]  Laurence Pierre,et al.  A Generic Model for Formally Verifying NoC Communication Architectures: A Case Study , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[5]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[6]  Chita R. Das,et al.  Exploring Fault-Tolerant Network-on-Chip Architectures , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[7]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Sharad Malik,et al.  Complementary use of runtime validation and model checking , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[9]  Pedro López,et al.  Software-based deadlock recovery technique for true fully adaptive routing in wormhole networks , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[10]  Albert Meixner,et al.  Argus: Low-Cost, Comprehensive Error Detection in Simple Cores , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[11]  Luca Benini,et al.  Analysis of error recovery schemes for networks on chips , 2005, IEEE Design & Test of Computers.

[12]  Valeria Bertacco,et al.  Engineering Trust with Semantic Guardians , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[13]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[14]  Zeljko Zilic,et al.  Assertion Checkers in Verification, Silicon Debug and In-Field Diagnosis , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).

[15]  John Kim,et al.  Router microarchitecture and scalability of ring topology in on-chip networks , 2009, 2009 2nd International Workshop on Network on Chip Architectures.

[16]  Timothy Mark Pinkston,et al.  An efficient, fully adaptive deadlock recovery scheme: DISHA , 1995, ISCA.

[17]  Harry Foster Guidelines for creating a formal verification testplan , 2006 .

[18]  Valeria Bertacco,et al.  Functional correctness for CMP interconnects , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).