Towards Dependable Network-on-Chip Architectures

The aggressive semiconductor technology scaling provides the means for doubling the amount of transistors on a single chip each and every 18 months. To efficiently utilize these vast chip resources, Multi-Processor Systems on Chip (MPSoCs) integrated with a Network-on-Chip (NoC) communication infrastructure have been widely investigated. However, the transistor miniaturization also significantly increases the possibility of transient and permanent faults occurrence inside the chip, especially for NoCs as they geometrically spread all over the chip real estate. To provide dependable communication service, the NoC must maintain its functionality and gracefully degrade its performance in the presence of faults. In this dissertation, we propose several novel NoC tailored mechanisms to tolerate faults induced by, e.g., variability agents, ageing, environmental aggression factors, as well as to efficiently utilize still functional NoC components. We first introduce a low cost method to allow for correct flit transmission even when soft errors are occurring in the router control plane. Then we propose a Flit Serialization (FS) strategy to tolerate broken link wires and to efficiently utilize the remaining link bandwidth. Within the FS framework heavily defected links whose fault levels exceed a certain threshold value are deactivated to diminish the congestion in their upstream routers. Moreover, we design a distributed logic based routing algorithm able to tolerate totally broken links as well as to efficiently utilize UnPaired Functional (UPF) Links in partially defected interconnects. We also introduce a link bandwidth aware run-time task mapping algorithm to improve the mapping quality for newly injected applications in the MPSoCs. Last but not least, we discuss the application of aforementioned strategies in 3D NoC systems and propose a Bus Virtual channel Allocation (BVA) mechanism to enable vertical wormhole switching to improve the performance of 3D NoC-Bus hybrid systems. All proposals are evaluated in our mixed language NoC simulation platform and their advantage over state of the art counterparts are proved by means of experimental results.

[1]  Hannu Tenhunen,et al.  A Stacked Mesh 3D NoC Architecture Enabling Congestion-Aware and Reliable Inter-layer Communication , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[2]  Chita R. Das,et al.  A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[3]  Sorin Cotofana,et al.  Link Bandwidth Aware Backtracking Based Dynamic Task Mapping in NoC based MPSoCs , 2014, NoCArc '14.

[4]  Lorena Anghel,et al.  Adaptive inter-layer message routing in 3D networks-on-chip , 2011, Microprocess. Microsystems.

[5]  Scott A. Mahlke,et al.  BulletProof: a defect-tolerant CMP switch architecture , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[6]  Hannu Tenhunen,et al.  HARAQ: Congestion-Aware Learning Model for Highly Adaptive Routing Algorithm in On-Chip Networks , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[7]  Sorin Cotofana,et al.  A low cost method to tolerate soft errors in the NoC router control plane , 2013, 2013 IEEE International SOC Conference.

[8]  Jörg Henkel,et al.  ADAM: Run-time agent-based distributed application mapping for on-chip communication , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[9]  Chita R. Das,et al.  Design and analysis of an NoC architecture from performance, reliability and energy perspective , 2005, 2005 Symposium on Architectures for Networking and Communications Systems (ANCS).

[10]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[11]  Alessandro Strano,et al.  Process variation and layout mismatch tolerant design of source synchronous links for GALS networks-on-chip , 2010, 2010 International Symposium on System on Chip.

[12]  Hiroshi Iwai,et al.  Roadmap for 22nm and beyond , 2009 .

[13]  Loren Schwiebert,et al.  Optimal fully adaptive wormhole routing for meshes , 1993, Supercomputing '93. Proceedings.

[14]  Chrysostomos Nicopoulos,et al.  A fine-grained link-level fault-tolerant mechanism for networks-on-chip , 2010, 2010 IEEE International Conference on Computer Design.

[15]  Pasi Liljeberg,et al.  Smart hill climbing for agile dynamic mapping in many-core systems , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[16]  Philip G. Emma,et al.  Interconnects in the Third Dimension: Design Challenges for 3D ICs , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[17]  Jeffrey T. Draper,et al.  Fault-Tolerant Flow Control in On-chip Networks , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[18]  Vladimir Pasca,et al.  Configurable serial fault-tolerant link for communication in 3D integrated systems , 2010, 2010 IEEE 16th International On-Line Testing Symposium.

[19]  Sorin Cotofana,et al.  A unified aging model of NBTI and HCI degradation towards lifetime reliability management for nanoscale MOSFET circuits , 2011, 2011 IEEE/ACM International Symposium on Nanoscale Architectures.

[20]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[21]  Lionel M. Ni,et al.  Fault-tolerant routing in hypercube multicomputers using local safety information , 1996 .

[22]  Daniel Marcos Chapiro,et al.  Globally-asynchronous locally-synchronous systems , 1985 .

[23]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[24]  Andrs Vajda Programming Many-Core Chips , 2011 .

[25]  Martha Johanna Sepúlveda,et al.  An evolutive approach for designing thermal and performance-aware heterogeneous 3D-NoCs , 2013, 2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI).

[26]  Sakir Sezer,et al.  Exploring Virtual-Channel architecture in FPGA based Networks-on-Chip , 2011, 2011 IEEE International SOC Conference.

[27]  Hannu Tenhunen,et al.  Cluster-based topologies for 3D Networks-on-Chip using advanced inter-layer bus architecture , 2013, J. Comput. Syst. Sci..

[28]  Chrysostomos Nicopoulos,et al.  A highly robust distributed fault-tolerant routing algorithm for NoCs with localized rerouting , 2012, INA-OCMC '12.

[29]  Hannu Tenhunen,et al.  Adjustable contiguity of run-time task allocation in networked many-core systems , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[30]  Radu Marculescu,et al.  Run-Time Task Allocation Considering User Behavior in Embedded Multiprocessor Networks-on-Chip , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[31]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[32]  Sakir Sezer,et al.  Design of interlock-free combined allocators for Networks-on-Chip , 2012, 2012 IEEE International SOC Conference.

[33]  Luca Benini,et al.  Analysis of error recovery schemes for networks on chips , 2005, IEEE Design & Test of Computers.

[34]  Vladimir Pasca,et al.  Efficient link-level error resilience in 3D NoCs , 2012, 2012 IEEE 15th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[35]  Jason Cong,et al.  Three-Dimensional Integrated Circuit Design: EDA, Design and Microarchitectures , 2009 .

[36]  Partha Pratim Pande,et al.  Testing Network-on-Chip Communication Fabrics , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[37]  Ashish Gambhir,et al.  A Comparison of Network-on-chip and Buses , 2014 .

[38]  Chita R. Das,et al.  MIRA: A Multi-layered On-Chip Interconnect Router Architecture , 2008, 2008 International Symposium on Computer Architecture.

[39]  Mariagiovanna Sami,et al.  Fault-Tolerant Network Interfaces for Networks-on-Chip , 2014, IEEE Transactions on Dependable and Secure Computing.

[40]  Chita R. Das,et al.  A novel dimensionally-decomposed router for on-chip communication in 3D architectures , 2007, ISCA '07.

[41]  Mahmut T. Kandemir,et al.  Design and Management of 3D Chip Multiprocessors Using Network-in-Memory , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[42]  Partha Pratim Pande,et al.  NoC Interconnect Yield Improvement Using Crosspoint Redundancy , 2006, 2006 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[43]  Sudeep Pasricha,et al.  Exploring serial vertical interconnects for 3D ICs , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[44]  Radu Marculescu,et al.  Energy- and Performance-Aware Incremental Mapping for Networks on Chip With Multiple Voltage Levels , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[45]  Li-Shiuan Peh,et al.  ARIADNE: Agnostic Reconfiguration in a Disconnected Network Environment , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[46]  Luca Benini,et al.  NoC synthesis flow for customized domain specific multiprocessor systems-on-chip , 2005, IEEE Transactions on Parallel and Distributed Systems.

[47]  James Tschanz,et al.  Impact of Parameter Variations on Circuits and Microarchitecture , 2006, IEEE Micro.

[48]  Sorin Cotofana,et al.  A Novel Flit Serialization Strategy to Utilize Partially Faulty Links in Networks-on-Chip , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[49]  Suresh Chalasani,et al.  Communication in Multicomputers with Nonconvex Faults , 1997, IEEE Trans. Computers.

[50]  Alexander Wei Yin,et al.  Explorations of Honeycomb Topologies for Network-on-Chip , 2009, 2009 Sixth IFIP International Conference on Network and Parallel Computing.

[51]  Vladimir Pasca,et al.  Through-silicon-via built-in self-repair for aggressive 3D integration , 2012, 2012 IEEE 18th International On-Line Testing Symposium (IOLTS).

[52]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[53]  Diederik Verkest,et al.  Run-Time Management of a MPSoC Containing FPGA Fabric Tiles , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[54]  Paul Ampadu,et al.  Transient and Permanent Error Co-management Method for Reliable Networks-on-Chip , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[55]  Suresh Chalasani,et al.  Communication in Multicomputers with Nonconvex Faults , 1995, IEEE Trans. Computers.

[56]  Nacer-Eddine Zergainoh,et al.  Fault-Tolerant Deadlock-Free Adaptive Routing for Any Set of Link and Node Failures in Multi-cores Systems , 2010, 2010 Ninth IEEE International Symposium on Network Computing and Applications.

[57]  Hideharu Amano,et al.  Tightly-Coupled Multi-Layer Topologies for 3-D NoCs , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[58]  Abhijit Chatterjee,et al.  Analysis and optimization of nanometer CMOS circuits for soft-error tolerance , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[59]  Paul Ampadu,et al.  Exploiting inherent information redundancy to manage transient errors in NoC routing arbitration , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[60]  Taisook Han,et al.  Fault-Tolerant Wormhole Routing in Mesh with Overlapped Solid Fault Regions , 1997, Parallel Comput..

[61]  Luca Benini,et al.  Timing-Error-Tolerant Network-on-Chip Design Methodology , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[62]  Sorin Cotofana,et al.  A direct measurement scheme of amalgamated aging effects with novel on-chip sensor , 2013, 2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC).

[63]  Hannu Tenhunen,et al.  SHiFA: System-level hierarchy in run-time fault-aware management of many-core systems , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[64]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[65]  Cheng-Kok Koh,et al.  SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips , 2003, ICCAD.

[66]  B. Grundmann,et al.  From Single Core to Multi-Core: Preparing for a new exponential , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[67]  Sorin Cotofana,et al.  Critical transistors nexus based circuit-level aging assessment and prediction , 2014, J. Parallel Distributed Comput..

[68]  Lionel M. Ni,et al.  Fault-tolerant wormhole routing in meshes without virtual channels , 1996, IEEE Transactions on Parallel and Distributed Systems.

[69]  Vincenzo Catania,et al.  Leveraging Partially Faulty Links Usage for Enhancing Yield and Performance in Networks-on-Chip , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[70]  Vincenzo Catania,et al.  Bandwidth-aware routing algorithms for networks-on-chip platforms , 2009, IET Comput. Digit. Tech..

[71]  Mahmood Fathy,et al.  AFRA: A low cost high performance reliable routing for 3D mesh NoCs , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[72]  Tae Hee Han,et al.  3D Network-on-Chip system communication using minimum number of TSVs , 2011, ICTC 2011.

[73]  Vincenzo Catania,et al.  Implementation and Analysis of a New Selection Strategy for Adaptive Routing in Networks-on-Chip , 2008, IEEE Transactions on Computers.

[74]  Wayne H. Wolf,et al.  TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[75]  Hannu Tenhunen,et al.  Minimal-path fault-tolerant approach using connection-retaining structure in Networks-on-Chip , 2013, 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS).

[76]  G. Blake,et al.  A survey of multicore processors , 2009, IEEE Signal Processing Magazine.

[77]  Luca Benini,et al.  Characterization and Implementation of Fault-Tolerant Vertical Links for 3-D Networks-on-Chip , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[78]  Sudeep Pasricha,et al.  A low overhead fault tolerant routing scheme for 3D Networks-on-Chip , 2011, 2011 12th International Symposium on Quality Electronic Design.

[79]  Paul Ampadu,et al.  Adaptive Error Control for NoC Switch-to-Switch Links in a Variable Noise Environment , 2008, 2008 IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems.

[80]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[81]  Yusuf Leblebici,et al.  Design and feasibility of multi-Gb/s quasi-serial vertical interconnects based on TSVs for 3D ICs , 2010, 2010 18th IEEE/IFIP International Conference on VLSI and System-on-Chip.

[82]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[83]  Yuan Xie,et al.  Design space exploration for 3D architectures , 2006, JETC.

[84]  Sorin Cotofana,et al.  Lifetime reliability assessment with aging information from low-level sensors , 2013, GLSVLSI '13.

[85]  Partha Pratim Pande,et al.  Networks-on-Chip in a Three-Dimensional Environment: A Performance Evaluation , 2009, IEEE Transactions on Computers.

[86]  Alain Greiner,et al.  A reconfigurable routing algorithm for a fault-tolerant 2D-Mesh Network-on-Chip , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[87]  Peter Hazucha,et al.  Characterization of soft errors caused by single event upsets in CMOS processes , 2004, IEEE Transactions on Dependable and Secure Computing.

[88]  Viswanathan Subramanian,et al.  Low overhead Soft Error Mitigation techniques for high-performance and aggressive systems , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[89]  Pasi Liljeberg,et al.  Multi network interface architectures for fault tolerant Network-on-Chip , 2009, 2009 International Symposium on Signals, Circuits and Systems.

[90]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[91]  S. Shirinzadeh,et al.  A novel soft error hardened latch design in 90nm CMOS , 2012, The 16th CSI International Symposium on Computer Architecture and Digital Systems (CADS 2012).

[92]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[93]  D. Grunwald,et al.  The Performance of Multicomputer Interconnection Networks , 1987, Computer.

[94]  Federico Silla,et al.  A new mechanism to deal with process variability in NoC links , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[95]  Valentin Puente,et al.  Immunet: Dependable Routing for Interconnection Networks with Arbitrary Topology , 2008, IEEE Transactions on Computers.

[96]  Fernando Gehm Moraes,et al.  Dynamic Task Mapping for MPSoCs , 2010, IEEE Design & Test of Computers.

[97]  Wolfgang Schröder-Preikschat,et al.  DistRM: Distributed resource management for on-chip many-core systems , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[98]  Pasi Liljeberg,et al.  Online Reconfigurable Self-Timed Links for Fault Tolerant NoC , 2007, VLSI Design.

[99]  Yu Hen Hu,et al.  A Fault-Tolerant NoC Scheme using bidirectional channel , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[100]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[101]  Michael Taylor A landscape of the new dark silicon design regime , 2013 .

[102]  Pasi Liljeberg,et al.  Optimized Q-learning model for distributing traffic in on-Chip Networks , 2012, 2012 IEEE 3rd International Conference on Networked Embedded Systems for Every Application (NESEA).

[103]  Chita R. Das,et al.  Exploring Fault-Tolerant Network-on-Chip Architectures , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[104]  Natalie D. Enright Jerger,et al.  Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[105]  Diana Marculescu,et al.  A Low-Cost, Systematic Methodology for Soft Error Robustness of Logic Circuits , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[106]  Radu Marculescu,et al.  Energy- and performance-aware mapping for regular NoC architectures , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[107]  Ge-Ming Chiu,et al.  A Fault-Tolerant Routing Scheme for Meshes with Nonconvex Faults , 2001, IEEE Trans. Parallel Distributed Syst..

[108]  Radu Marculescu,et al.  Stochastic Communication: A New Paradigm for Fault-Tolerant Networks-on-Chip , 2007, VLSI Design.