Building reliable systems-on-chip in nanoscale technologies

Modern application-specific integrated circuits (ASICs) contain complete systems on a single die, composed of many processing elements that communicate over a dedicated router-based on-chip network. As systems-on-chip comprise billions of transistors with feature sizes in the range of 10 nm, reliable operation cannot be established without carefully engineered support at all levels, from technology to the circuit- and the system-layer. This article surveys contributions of research groups at TU Wien to this field. At lower levels of abstraction, they range from the generation of fault models for simulation that closely match reality and are at the same time efficient to use, to circuit-level radiation-tolerance techniques. At the level of on-chip networks, novel fault-tolerant routing algorithms are being developed together with architectural techniques to isolate faulty parts while keeping the healthy parts connected and active.The article will briefly portray the associated research activities and summarize their most relevant results.ZusammenfassungModerne anwendungsspezifische integrierte Schaltungen (ASICs) beinhalten auf einem einzigen Chip ganze Systeme, bestehend aus einer Vielzahl an Funktionsblöcken, die über eigene Router-basierte “On-chip”-Netzwerke kommunizieren. Der zuverlässige Betrieb eines solchen Milliarden an Transistoren mit Feature-Size in Bereich von 10 nm umfassenden Systems kann nur durch sorgfältig ausgelegte Maßnahmen auf allen Ebenen, von der Technologie über das Schaltungsdesign bis hin zur Systemebene, gewährleistet werden. Der vorliegende Artikel gibt einen Überblick über die diesbezüglichen Beiträge der Forschergruppen an der TU Wien. Auf den unteren Abstraktionsebenen reichen diese von der Erstellung möglichst wirklichkeitsgetreuer Fehlermodelle für die Simulation, die dennoch handhabbar bleiben, bis hin zu schaltungstechnischen Maßnahmen zur Erhöhung der Strahlungsfestigkeit. Auf der Ebene der On-chip-Netzwerke werden neuartige fehlertolerante Routing-Algorithmen in Kombination mit Architekturmaßnahmen entwickelt, die fehlerhafte Bereiche isolieren, während funktionierende Teile verbunden und aktiv bleiben.Der Artikel umreißt die entsprechenden Forschungsaktivitäten und skizziert ihre wichtigsten Ergebnisse.

[1]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[2]  J Keane,et al.  An odomoeter for CPUs , 2011, IEEE Spectrum.

[3]  Andreas Steininger,et al.  Radiation-tolerant combinational gates - an implementation based comparison , 2012, 2012 IEEE 15th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[4]  Axel Jantsch,et al.  Methods for fault tolerance in networks-on-chip , 2013, CSUR.

[5]  Axel Jantsch,et al.  Addressing Transient and Permanent Faults in NoC With Efficient Fault-Tolerant Deflection Router , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Alan Wood,et al.  The impact of new technology on soft error rates , 2011, 2011 International Reliability Physics Symposium.

[7]  Axel Jantsch,et al.  A reconfigurable fault-tolerant deflection routing algorithm based on reinforcement learning for network-on-chip , 2010, NoCArc '10.

[8]  James Turner Scoping out my iPad–oscilloscoping, that is [Hands On] , 2011, IEEE Spectrum.

[9]  Andreas Steininger,et al.  Performance of radiation hardening techniques under voltage and temperature variations , 2013, 2013 IEEE Aerospace Conference.

[10]  Andreas Steininger,et al.  Exploring the state dependent SET sensitivity of asynchronous logic - The muller-pipeline example , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[11]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[12]  Thomas Polzer,et al.  Architecture and Design Analysis of a Digital Single-Event Transient/Upset Measurement Chip , 2012, 2012 15th Euromicro Conference on Digital System Design.

[13]  M. D. Giles,et al.  Process Technology Variation , 2011, IEEE Transactions on Electron Devices.

[14]  Masoud Daneshtalab,et al.  Routing Algorithms in Networks-on-Chip , 2013 .

[15]  Andreas Steininger,et al.  Supply Voltage Dependent On-Chip Single-Event Transient Pulse Shape Measurements in 90-nm Bulk CMOS Under Alpha Irradiation , 2013, IEEE Transactions on Nuclear Science.

[16]  A. Steininger,et al.  Pulse Shape Measurements by On-Chip Sense Amplifiers of Single Event Transients Propagating Through a 90 nm Bulk CMOS Inverter Chain , 2012, IEEE Transactions on Nuclear Science.

[17]  Axel Jantsch,et al.  FoN: Fault-on-Neighbor aware routing algorithm for Networks-on-Chip , 2010, 23rd IEEE International SOC Conference.

[18]  Thomas Polzer,et al.  An infrastructure for accurate characterization of single-event transients in digital circuits☆ , 2013, Microprocess. Microsystems.

[19]  Andreas Steininger,et al.  Measuring SET pulsewidths in logic gates using digital infrastructure , 2014, Fifteenth International Symposium on Quality Electronic Design.