A survey on fault-tolerant application mapping techniques for Network-on-Chip

Abstract Reliability is becoming a major concern in Networks-on-Chips (NoCs) design. Several techniques have been proposed in the literature to deal with different types of faults at different levels of a NoC. This paper surveys the work performed on fault-tolerant mapping techniques in NoCs. A classification is proposed, based on the approaches adopted to recover from failures. Techniques based on the combination of mapping and routing, techniques based on redundancy, and techniques based on task remapping are the main categories of the classification proposed. Furthermore, performance comparison among the listed techniques is provided to highlight the differences. A discussion of the proposed techniques is also performed for each category, leading to some open issues.

[1]  Radu Marculescu,et al.  FARM: Fault-aware resource management in NoC-based multiprocessor platforms , 2011, 2011 Design, Automation & Test in Europe.

[2]  Naresh Kumar Reddy Beechu,et al.  Energy-Aware and Reliability-Aware Mapping for NoC-Based Architectures , 2018, Wirel. Pers. Commun..

[3]  Luigi Carro,et al.  Improving Reliability in NoCs by Application-Specific Mapping Combined with Adaptive Fault-Tolerant Method in the Links , 2011, 2011 Sixteenth IEEE European Test Symposium.

[4]  Sebastian Werner,et al.  A Survey on Design Approaches to Circumvent Permanent Faults in Networks-on-Chip , 2016, ACM Comput. Surv..

[5]  Qiang Li,et al.  Optimizing dynamic mapping techniques for on-line NoC test , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[6]  Pedro B. Campos,et al.  Fault tolerant task mapping on many-core arrays , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[7]  Naresh Kumar Reddy Beechu,et al.  An energy-efficient fault-aware core mapping in mesh-based network on chip systems , 2017, J. Netw. Comput. Appl..

[8]  Ahmad Khademzadeh,et al.  FERNA: a Performance/Cost Aware Spare Switch Selection Algorithm for Fault Tolerant NoC Architecture , 2009 .

[9]  Chenchen Deng,et al.  An Efficient Application Mapping Approach for the Co-Optimization of Reliability, Energy, and Performance in Reconfigurable NoC Architectures , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Chenchen Deng,et al.  Reliability-aware mapping for various NoC topologies and routing algorithms under performance constraints , 2014, Science China Information Sciences.

[11]  Akash Kumar,et al.  Fault-aware task re-mapping for throughput constrained multimedia applications on NoC-based MPSoCs , 2012, 2012 23rd IEEE International Symposium on Rapid System Prototyping (RSP).

[12]  Axel Jantsch,et al.  Designing 2D and 3D Network-on-Chip Architectures , 2013 .

[13]  Mohammad Hosseinabady,et al.  Run-time resource management in fault-tolerant network on reconfigurable chips , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[14]  Daniela Panno,et al.  Designing Robust Routing Algorithms and Mapping Cores in Networks-on-Chip: A Multi-objective Evolutionary-based Approach , 2012, J. Univers. Comput. Sci..

[15]  Giovanni De Micheli,et al.  Design, synthesis, and test of networks on chips , 2005, IEEE Design & Test of Computers.

[16]  William J. Dally,et al.  Research Challenges for On-Chip Interconnection Networks , 2007, IEEE Micro.

[17]  Onur Derin,et al.  Online task remapping strategies for fault-tolerant Network-on-Chip multiprocessors , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[18]  Davide Bertozzi,et al.  Supporting Task Migration in Multi-Processor Systems-on-Chip: A Feasibility Study , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[19]  Dimiter R. Avresky,et al.  Analysis of Adaptive Mapping of Parallelized Application on Multicore System , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[20]  Bharadwaj Veeravalli,et al.  Communication and migration energy aware design space exploration for multicore systems with intermittent faults , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  Akram Reza,et al.  Yield modeling and yield-aware mapping for application specific networks-on-chip , 2011, 2011 NORCHIP.

[22]  Nader Bagherzadeh,et al.  From UML specifications to mapping and scheduling of tasks into a NoC, with reliability considerations , 2013, J. Syst. Archit..

[23]  Fernando Gehm Moraes,et al.  Trading-off system load and communication in mapping heuristics for improving NoC-based MPSoCs reliability , 2015, Sixteenth International Symposium on Quality Electronic Design.

[24]  Amit Kumar Singh,et al.  Energy-aware dynamic reconfiguration of communication-centric applications for reliable MPSoCs , 2013, 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[25]  Om Prakash Yadav,et al.  Energy and reliability oriented mapping for regular Networks-on-Chip , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[26]  Bharadwaj Veeravalli,et al.  Combined DVFS and mapping exploration for lifetime and soft-error susceptibility improvement in MPSoCs , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[27]  Chenchen Deng,et al.  A Flexible Energy- and Reliability-Aware Application Mapping for NoC-Based Reconfigurable Architectures , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[28]  Vittorio Zaccaria,et al.  Yield enhancement by robust application-specific mapping on network-on-chips , 2009, 2009 2nd International Workshop on Network on Chip Architectures.

[29]  Donald E. Thomas,et al.  Lifetime improvement through runtime wear-based task mapping , 2012, CODES+ISSS '12.

[30]  Chenchen Deng,et al.  A Multi-Objective Model Oriented Mapping Approach for NoC-based Computing Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[31]  Axel Jantsch,et al.  Methods for fault tolerance in networks-on-chip , 2013, CSUR.

[32]  Kwang-Ting Cheng,et al.  Yield and Cost Analysis of a Reliable NoC , 2009, 2009 27th IEEE VLSI Test Symposium.

[33]  R. Farah,et al.  A method for efficient mapping and reliable routing for NoC architectures with minimum bandwidth and area , 2008, 2008 Joint 6th International IEEE Northeast Workshop on Circuits and Systems and TAISA Conference.

[34]  Mahmut T. Kandemir,et al.  Reliability-aware Co-synthesis for Embedded Systems , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[35]  Ahmad Khademzadeh,et al.  Overhead- and Performance-Aware Fault-Tolerant Architecture for Application-Specific Network-on-Chip , 2013 .

[36]  Mouloud Koudil,et al.  NoC routing protocols - objective-based classification , 2016, J. Syst. Archit..

[37]  Wei Quan,et al.  A system-level simulation framework for evaluating task migration in MPSoCs , 2014, 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[38]  Alex Orailoglu,et al.  Predictable execution adaptivity through embedding dynamic reconfigurability into static MPSoC schedules , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[39]  Luigi Carro,et al.  Fault-Tolerant Techniques to Manage Yield and Power Constraints in Network-on-Chip Interconnections , 2012, VLSI-SoC.

[40]  Guy Gogniat,et al.  A multi-objective approach for multi-application NoC mapping , 2011, 2011 IEEE Second Latin American Symposium on Circuits and Systems (LASCAS).

[41]  Ahmad Khademzadeh,et al.  Link Testing: a Survey of Current Trends in Network on Chip , 2017, Journal of Electronic Testing.

[42]  Xinpeng Zhang,et al.  A multiobjective scatter search algorithm for fault-tolerant NoC mapping optimisation , 2014 .

[43]  Hamid R. Zarandi,et al.  A fault-aware low-energy spare core allocation in networks-on-chip , 2012, NORCHIP 2012.

[44]  Nacer-Eddine Zergainoh,et al.  Adaptive Mapping of Parallelized Application (Fork-join DAG) on Multicore System in the Presence of Multiple Failures , 2012, IPDPS 2012.

[45]  Santanu Chattopadhyay,et al.  A reliability aware application mapping onto mesh based Network-on-Chip , 2016, 2016 3rd International Conference on Recent Advances in Information Technology (RAIT).

[46]  Vasantha Moodabettu Harishchandra,et al.  System level fault-tolerance core mapping and FPGA-based verification of NoC , 2017, Microelectron. J..

[47]  Masaru Fukushi,et al.  A Hardware-Oriented Fault-Tolerant Routing Algorithm for Irregular 2D-Mesh Network-on-Chip without Virtual Channels , 2010, DFT.

[48]  Hamid R. Zarandi,et al.  A Fault-Tolerant Low-Energy Multi-Application Mapping onto NoC-based Multiprocessors , 2012, 2012 IEEE 15th International Conference on Computational Science and Engineering.

[49]  Jaan Raik,et al.  Holistic Approach for Fault-Tolerant Network-on-Chip based Many-Core Systems , 2016, ArXiv.

[50]  Xiaohang Wang,et al.  Throughput Optimization for Lifetime Budgeting in Many-Core Systems , 2017, ACM Great Lakes Symposium on VLSI.

[51]  Bharadwaj Veeravalli,et al.  Energy-Aware Communication and Remapping of Tasks for Reliable Multimedia Multiprocessor Systems , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[52]  Ahmad Khademzadeh,et al.  Fault-Tolerant Application-Specific Network-on-Chip , 2011 .

[53]  Lei Zhang,et al.  A two-stage variation-aware task mapping scheme for fault-tolerant multi-core Network-on-Chips , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[54]  David de Andrés,et al.  Fault Tolerance on NoCs , 2013, 2013 27th International Conference on Advanced Information Networking and Applications Workshops.

[55]  Sajjad Ahmad Madani,et al.  Dynamic task mapping for Network-on-Chip based systems , 2015, J. Syst. Archit..

[56]  Hamid R. Zarandi,et al.  A fault-tolerant core mapping technique in networks-on-chip , 2013, IET Comput. Digit. Tech..

[57]  Santanu Chattopadhyay,et al.  A survey on application mapping strategies for Network-on-Chip design , 2013, J. Syst. Archit..

[58]  Donald E. Thomas,et al.  Cost-effective slack allocation for lifetime improvement in NoC-based MPSoCs , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[59]  Onur Derin,et al.  Towards a Reliability-aware Design Flow for Kahn Process Networks on NoC-based Multiprocessors , 2014, ARCS Workshops.

[60]  Vincenzo Catania,et al.  Application Specific Routing Algorithms for Networks on Chip , 2009, IEEE Transactions on Parallel and Distributed Systems.

[61]  Santanu Chattopadhyay,et al.  Network-on-Chip , 2014 .

[62]  Bharadwaj Veeravalli,et al.  Energy-aware task mapping and scheduling for reliable embedded computing systems , 2014, ACM Trans. Embed. Comput. Syst..

[63]  M. H. Vasantha,et al.  Communication energy constrained spare core on NoC , 2015, 2015 6th International Conference on Computing, Communication and Networking Technologies (ICCCNT).

[64]  Nan Jiang,et al.  A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[65]  Salvatore Monteleone,et al.  Cycle-Accurate Network on Chip Simulation with Noxim , 2016, ACM Trans. Model. Comput. Simul..

[66]  Tajana Rosing,et al.  Temperature aware task scheduling in MPSoCs , 2007 .

[67]  Jim Harkin,et al.  Online traffic-aware fault detection for networks-on-chip , 2014, J. Parallel Distributed Comput..

[68]  Bharadwaj Veeravalli,et al.  Communication and migration energy aware task mapping for reliable multiprocessor systems , 2014, Future Gener. Comput. Syst..

[69]  Yi He,et al.  An inclusive fault model for Network-on-Chip , 2015, 2015 IEEE 11th International Conference on ASIC (ASICON).

[70]  Sarita V. Adve,et al.  Architectures for online error detection and recovery in multicore processors , 2011, 2011 Design, Automation & Test in Europe.

[71]  Masaru Fukushi,et al.  Route-Aware Task Mapping Method for Fault-Tolerant 2D-Mesh Network-on-Chips , 2011, 2011 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems.

[72]  Jinxiang Wang,et al.  A Fast Two-Step Topology Reconfiguration Algorithm for Core-Level Fault Tolerance in NoCs , 2012, 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming.

[73]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[74]  Fernando Gehm Moraes,et al.  A distributed energy-aware task mapping to achieve thermal balancing and improve reliability of many-core systems , 2015, 2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI).

[75]  Hokeun Kim,et al.  A task remapping technique for reliable multi-core embedded systems , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[76]  Mohammad Hosseinabady,et al.  Fault-Tolerant Reconfigurable On-Chip-Network , 2014 .

[77]  Leibo Liu,et al.  A VLSI architecture for enhancing the fault tolerance of NoC using quad-spare mesh topology and dynamic reconfiguration , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[78]  Mauro Pezzè,et al.  Measuring Software Redundancy , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[79]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[80]  Guowu Yang,et al.  Reliable NoC Mapping Based on Scatter Search , 2012, ICICA.

[81]  Juan M. Orduña,et al.  A multi-objective strategy for concurrent mapping and routing in networks on chip , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[82]  Alexandre M. Amory,et al.  Multi-task dynamic mapping onto NoC-based MPSoCs , 2011, SBCCI '11.

[83]  Yuan Wen Hau,et al.  Network Partitioning Domain Knowledge Multiobjective Application Mapping for Large-Scale Network-on-Chip , 2014, Appl. Comput. Intell. Soft Comput..

[84]  Bin Li,et al.  A Multi-objective Mapping Strategy for Application Specific Emesh Network-on-Chip (NoC) , 2012, ICSI.

[85]  Israel Koren,et al.  Fault-Tolerant Systems , 2007 .

[86]  Xinpeng Zhang,et al.  Pareto optimal mapping for tile-based network-on-chip under reliability constraints , 2015, Int. J. Comput. Math..

[87]  Bharadwaj Veeravalli,et al.  Reliability-driven task mapping for lifetime extension of networks-on-chip based multiprocessor systems , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[88]  Bharadwaj Veeravalli,et al.  Run-time mapping for reliable many-cores based on energy/performance trade-offs , 2013, 2013 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS).

[89]  Cristinel Ababei,et al.  Unified system level reliability evaluation methodology for multiprocessor Systems-on-Chip , 2012, 2012 International Green Computing Conference (IGCC).

[90]  Natalie D. Enright Jerger,et al.  Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[91]  Mahmut T. Kandemir,et al.  Fault tolerant algorithms for network-on-chip interconnect , 2004, IEEE Computer Society Annual Symposium on VLSI.

[92]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[93]  Naresh Kumar Reddy Beechu,et al.  Hardware implementation of fault tolerance NoC core mapping , 2018, Telecommun. Syst..

[94]  Andy D. Pimentel,et al.  A SAFE approach towards early design space exploration of fault-tolerant multimedia MPSoCs , 2012, CODES+ISSS.

[95]  Paolo Meloni,et al.  A system-level approach to adaptivity and fault-tolerance in NoC-based MPSoCs: The MADNESS project , 2013, Microprocess. Microsystems.

[96]  Guowu Yang,et al.  Performance-driven assignment and mapping for reliable networks-on-chips , 2014, Journal of Zhejiang University SCIENCE C.

[97]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[98]  Amit Kumar Singh,et al.  Execution Trace--Driven Energy-Reliability Optimization for Multimedia MPSoCs , 2015, ACM Trans. Reconfigurable Technol. Syst..

[99]  Siamak Mohammadi,et al.  A Majority-Based Reliability-Aware Task-Mapping in High-Performance Homogenous NoC Architectures , 2016, DSD.

[100]  Paolo Prinetto,et al.  Reliability in Application Specific Mesh-Based NoC Architectures , 2008, 2008 14th IEEE International On-Line Testing Symposium.

[101]  Santanu Chattopadhyay,et al.  Fault-Tolerant Dynamic Task Mapping and Scheduling for Network-on-Chip-Based Multicore Platform , 2017, ACM Trans. Embed. Comput. Syst..

[102]  Jürgen Teich,et al.  Dynamic decentralized mapping of tree-structured applications on NoC architectures , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[103]  Santanu Chattopadhyay,et al.  Task mapping and scheduling for network-on-chip based multi-core platform with transient faults , 2018, J. Syst. Archit..

[104]  Bharadwaj Veeravalli,et al.  Temperature aware energy-reliability trade-offs for mapping of throughput-constrained applications on multimedia MPSoCs , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[105]  Fernando Gehm Moraes,et al.  Hierarchical energy monitoring for task mapping in many-core systems , 2016, J. Syst. Archit..

[106]  Donald E. Thomas,et al.  Cost-effective lifetime and yield optimization for NoC-based MPSoCs , 2014, TODE.

[107]  Cristian Constantinescu,et al.  Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.

[108]  Michael Glaß,et al.  A Combined Mapping and Routing Algorithm for 3D NoCs Based on ASP , 2013, MBMV.

[109]  Paolo Meloni,et al.  System Adaptivity and Fault-Tolerance in NoC-based MPSoCs: The MADNESS Project Approach , 2012, 2012 15th Euromicro Conference on Digital System Design.

[110]  Kwang-Ting Cheng,et al.  Modeling Yield, Cost, and Quality of a Spare-Enhanced Multicore Chip , 2011, IEEE Transactions on Computers.

[111]  Soonhoi Ha,et al.  Reliability-aware mapping optimization of multi-core systems with mixed-criticality , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[112]  Kumar Y. B. Nithin,et al.  A Gracefully Degrading and Energy-Efficient Fault Tolerant NoC Using Spare Core , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[113]  Sudeep Pasricha,et al.  Reliability-aware and energy-efficient synthesis of NoC based MPSoCs , 2013, International Symposium on Quality Electronic Design (ISQED).

[114]  Freddy Bolaños-Martínez,et al.  Static and dynamic task mapping onto network on chip multiprocessors , 2014 .

[115]  Suleyman Tosun,et al.  Fault-Tolerant Irregular Topology Design Method for Network-on-Chips , 2014, 2014 17th Euromicro Conference on Digital System Design.

[116]  Ahmad Patooghy,et al.  RMAP: A Reliability-Aware Application Mapping for Network-on-Chips , 2010, 2010 Third International Conference on Dependability.

[117]  Petru Eles,et al.  Fault and energy-aware communication mapping with guaranteed latency for applications implemented on NoC , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[118]  Bharadwaj Veeravalli,et al.  Reliability and Energy-Aware Mapping and Scheduling of Multimedia Applications on Multiprocessor Systems , 2016, IEEE Transactions on Parallel and Distributed Systems.

[119]  Raimund Ubar,et al.  From online fault detection to fault management in Network-on-Chips: A ground-up approach , 2017, 2017 IEEE 20th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[120]  Ahmad Khademzadeh,et al.  Special Issue on a Fault Tolerant Network on Chip Architecture , 2010, WCE 2010.

[121]  Naresh Kumar Reddy Beechu,et al.  High-performance and energy-efficient fault-tolerance core mapping in NoC , 2017, Sustain. Comput. Informatics Syst..

[122]  Hamid R. Zarandi,et al.  A Reliability-Aware Multi-application Mapping Technique in Networks-on-Chip , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[123]  Alexandre M. Amory,et al.  Task mapping on NoC-based MPSoCs with faulty tiles: Evaluating the energy consumption and the application execution time , 2011, 2011 22nd IEEE International Symposium on Rapid System Prototyping.

[124]  Hao Wu,et al.  Run-Time Reconfiguration to Tolerate Core Failures for Real-Time Embedded Applications on NoC Manycore Platforms , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[125]  Axel Jantsch,et al.  A network on chip architecture and design methodology , 2002, Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002.

[126]  Nacer-Eddine Zergainoh,et al.  Self-Recovering Parallel Applications in Multi-core Systems , 2011, 2011 IEEE 10th International Symposium on Network Computing and Applications.

[127]  Sudeep Pasricha,et al.  HEFT: A hybrid system-level framework for enabling energy-efficient fault-tolerance in NoC based MPSoCs , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[128]  Kees G. W. Goossens,et al.  A unified approach to constrained mapping and routing on network-on-chip architectures , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[129]  Li Shang,et al.  Reliable multiprocessor system-on-chip synthesis , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).